Out  FILE  COPY 


Stanford  Artificial  Intelligence  Laboratory 
Memo  AIJlf-3.16  /' 

/ 

rnmpnt-Vfrl™'?? 


Report  NoySTAN“CS-78-671^f 


NATURAL  LANGUAG^-PROCESSING  IN  AN 
^ AUTOMATIC  PROGRAMMING  DOMAIN, 


by 


^ Jerrold  M.](3insparg 
_ 1 


Research  sponsored  by 
Advanced  Research  Projects  Agency 


/ 


JhcTorll  ft 


1 ^ > ! 


COMPUTER  SCIENCE  DEPARTMENT  \ 
Stanford  University  . Y 


- L- 


June  1878 


This  document  has  been  approved 
for  public  re!  cc"  r”d  sale;  il3 
distribution  is  unlimited. 


O W 120 


BEST 

AVAILABLE  COPY 


Stanford  Artificial  Intelligence  Laboratory 
Memo  AIM-316 


June  1978 


Computer  Science  Department 
Report  No.  STAN-CS-78-67 1 


NATURAL  LANGUAGE  PROCESSING  IN  AN 
AUTOMATIC  PROGRAMMING  DOMAIN  _ 0 

by 

Jerrold  M.  Ginsparg  1 1 11 T 3 


\ 

This  paper  is  about  communicating  with  computers  in  English.  In  particular,  It  describes  an 
interface  system  which  allows  a human  user  to  communicate  with  an  automatic  programming 
system  in  an  English  dialogue. 

The  interface  consists  of  two  parts.  The  first  is  a parser  called  Reader.  Reader  was  designed  to 
facilitate  writing  English  grammars  which  are  nearly  deterministic  in  that  they  consider  a very 
small  number  of  parse  paths  during  the  processing  of  a sentence.  This  efficiency  is  primarily 
derived  f-om  using  a single  parse  structure  to  represent  more  than  one  syntactic  interpretation  of 
the  ir.j  mtence. 

The  second  part  of  the  interface  is  an  art^nterpreter  which  represents  Reader’s  output  in  a form 
that  can  be  used  by  a computer  program  without  linguistic  knowledge.  The  Interpreter  is 
repsonsible  for  asking  questions  of  the  user,  processing  the  user’s  replies,  building  a 
representation  of  the  program  the  user’s  replies  describe,  and  supplying  the  parser  with  any  of 
the  contextual  inform  or  general  knowledge  it  needs  while  parsing.  * 

This  thesis  was  su'mitted  to  the  Department  of  Computer  Science  and  the  Committee  on  Graduate 
Studies  of  Stanford  University  in  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of 
Philosophy 

This  research  was  supported  by  the  Advanced  Research  Projects  Agency  of  the  Department  of 
Defense  under  ARP  A Order  No.  2494,  Contract  MDA903-76-C-0206.'The  views  and  conclusions 
contained  in  this  document  are  those  of  the  authors  and  should  not  be  interpreted  as  necessarily 
representing  the  official  policies,  either  expressed  or  implied,  of  Stanford  University,  or  any  agency 
of  the  U.  S.  Government. 


* / 
4 


Q 

u V)  J-  & 


ACKNOWLEDGEMENTS 


I would  like  to  thank, 

my  advisor,  Professor  Terry  Wlnograd, 

the  members  of  my  reading  committee:  Professor  Cordell 
Green  and  Dr.  Daniel  Bobrow, 

the  PSI  group:  Dave  Barstow,  Richard  Gabriel,  Elaine  Kant, 
Juan  Ludlow,  Brian  McCune,  Jorge  Phillips  and  Lou  Steinberg, 

and  Martin  Brooks. 

for  their  help  In  the  preparation  of  this  thesis. 


ill 


( PRECEDING  PAOK  BUJK-NOT  Filial) 

v 


Table  of  Contents 


Section 


Page 


1.  Introduction 

1.1  Organization 

1.2  Capabilities 

1.2.1  The  parser 

1.2.2  The  Interpreter 

1.3  Three  Examples 

1.4  PSI 

1 .6  An  Overview 

1.6.1  Reader 

1.6.2  The  Interpretei 


2.  Parsing 

2.1  The  Basic  Algorithm 

2.2  Stack  structures  and  collapsing 

2.3  Reader’ s output 

2.3.1  Cases 

2.3.2  Tense  markers 

2.3.3  Noun  groups 

2.3.4  Choices 


1 

2 

2 

2 

6 

8 

16 
17 
1 7 
20 

23 

23 

26 

32 

32 

37 

42 

44 


2.3.6  Conventions 


46 


Table  of  Contents 


Section  Page 

3.  Grammar  writing  50 

3.1  Some  beginning  grammars  5 

3.1.1  Grammar. 1 52 

3.1.2  Grammar. 2 54 

3.1.3  Grammar.3  59 

3.1.4  Grammar. 4 52 

3.2  Grammar  efficiency  65 

3.2.1  Nouns  as  modifiers  6 7 

3.2.2  Relative  clauses  68 

2.2.3  Verbs  which  accept  clauses  70 

3.2.4  Conjunctions  72 

3.2.5  Verbs  Inflected  with  ed  endings  75 

4.  A closer  look  7g 

4.1  Measure  7g 

4.1.1  The  semantic  component  80 

4.1.2  The  Syntactic  Component  83 

4.2  Collapsing  86 

4.3  Formatting  g2 

4.3.1  Noun  groups  g2 

4.3.2  Conjunctions  93 

4.3.3  Filling  in  extra  cases  95 

4.3.4  Choii.ps  95 


v 


Table  of  Content* 


Section  Page 

4.4  Parallel  processing  90 

4.5  Other  parsers  100 

5.  The  Interpreter  106 

5.1  The  results  of  interpretation  108 

6.1.1  The  program  specification  108 

5.1.2  An  example  and  comparison  112 

5.1.3  Meta-comments  116 

6.2  The  knowledge  base  1 16 

5.2.1  Concepts  117 

5.2.2  Definitions  121 

5 2.3  Procedural  embedding  1 26 

5.3  The  processing  cycle  129 

5.4  Matching  132 

5.4.1  Nouns  132 

5.4.2  Pronouns  136 

5.4.3  Matching  to  implicitly  mentioned  compononts  140 

5.4.4  Coercion  142 

5.5  The  Reader/Interpreter  Interface  143 

6.6  Future  work  1 46 

5.6.1  Tense  evaluation  146 

5.6.2  More  domain  and  general  programming  support  1 47 

5.8.3  Building  up  more  concepts  and  definitions  1 48 

vl 


Table  of  Contents 


Section  Page 

6.  References  1 49 

Appendix  Page 

A.  Example  Dialogues  163 


vli 


1 


1.  Introduction 

This  paper  describes  a natural  language  processing  system  The  system  Interacts 
with  a human  user,  who  describes  a computer  program  to  It  In  English.  The  output  of 
the  system  Is  a program  specification,  a formal  representation  of  the  computer 
program  the  user  has  described.  The  program  specification  can  be  used  as  a data 
base  for  coding  the  user's  program  by  compute,  programs  without  linguistic  abilities. 

Understanding  program  descriptions  obtained  via  dialogues  requires  capabilities  for 
handling  almost  all  Issues  associated  with  natural  language  processing.  Indeed, 
[Hobbs  77]  mentions  that  even  processing  "well  written  algorithm  descriptions" 
Involves  "...some  of  the  hardest  problems  of  linguistic  analysis."  Since  many  of  the 
program  descriptions  posed  by  the  users  of  the  system  can  best  be  characterized 
os  "not  so  well  written",  the  system’s  natural  language  abilities  must  be  extensive. 

Tho  system  Is  most  naturally  viewed  as  two  Interrelated  programs:  a parser  and  an 
Interpreter.  Reader,  the  parser,  provides  the  means  of  storing  and  utilizing  the 
Information  about  sentence  structure  (called  syntax)  which  Is  necessary  for  the 
proper  Interpretation  of  the  meaning  of  a sentence.  Reader  Is  used  to  transform 
the  user's  replies  from  strings  of  words  Into  structures  In  which  the  relations 
between  words  are  made  explicit.  The  Interpreter  uses  the  structures  supplied  by 
Reader  to  construct  the  program  specification. 


Introduction 


2 


1.1  Organization 

The  next  section  discusses  the  natural  language  abilities  an  automatic  programming 
natural  language  system,  should  have.  The  following  section  contains  three  short 
examples  which  should  help  to  exactly  clarify  what  is  meant  by  the  program 
specification,  and  provide  some  perspective  on  the  natural  language  processing 
done  by  the  system.  The  parser/interpreter  can  be  used  as  part  of  a more 
complete  automatic  system.  Section  1.4  briefly  describes  this  system  and  the 
interpreter's  interaction  with  it.  Section  1.5  Is  a short  overview  of  the  operation  of 
both  Reader  and  the  interpreter. 

Chapter  2 Is  a general  discussion  of  Reader.  Chapters  3 and  4 continue  that 
discussion  in  much  more  detail  Chapter  5 describes  the  program  specification  and 
how  It  Is  built  by  the  Interpreter  Appendix  A contains  several  dialogues  run  by  the 

systen . 


1.2  Capabilities 

1.2.1  The  parser 

Reader  was  designed  with  the  following  criteria  ir:  mind. 

The  parser  should  be  able  to  quickly  recognize  a substantial  subset  of  English.  The 
parsing  should  be  done  quickly,  so  that  the  parser  can  be  used  In  a practical 
system.  Wo  mention  parsing  speed  and  grammar  coverage  together,  because  it  Is 
easy  to  theoretically  achieve  one  or  the  other  separately.  Almost  oil  parsing 


Introduction 


a 


schemes  can  parse  a smalt  set  of  sentences  quickly,  but  few  do  as  well  when 
recognizing  a targe  number  of  sentences  while  at  the  same  time  using  a vocabulary 
which  includes  all  possible  syntactic  uses  for  each  word  In  the  wcabi  ary.  Reader 
achieves  speed  without  sacrificing  grammar  breadth  because  its  prrslng  process 
can  combine  several  syntactic  possibilities  Into  a single  parse  path,  thereby 
avoiding  much  of  the  backtracking  or  equivalently,  parallel  processing,  which 
characterizes  many  other  parsing  schemes. 

There  should  be  a well  defined  Interface  between  the  parser  and  Interpreter  which 
allows  the  parser  to  interact  with  the  interpreter  and  ask  It  to  choose  from  among 
competing  parser  which  are  possible  syntactic  Interpretations  of  a sentence.  This 
is  necessary  because  many  sentences  have  more  than  one  syntactic  interpretation. 
For  example,  in  "...find  a relation  in  the  concept  marked  ‘possible.’",  the  parser 
must  be  able  to  ask  whether  the  object  of  "find"  Is  "a  relation  whose  marking  Is 
‘possible'  which  Is  In  the  concept.",  or  "a  relation  which  Is  In  the  marked 
('possible' ) concept." 

The  parser  should  be  able  to  use  the  evaluation  function  of  the  Interpreter  to 
provide  parses  in  which  most  purely  "function"  words  are  eliminated.  Consider  the 
sentence,  "Classify  the  Input  list  on  the  basis  of  whether  or  not  it  fits  the  Initial 
list".  The  Interpreter  should  be  asked  to  Judge  the  modifications  among  "on  the 
basis  of",  "classify"  and  the  clause  introduced  by  "whether".  The  parser  should 
then  incorporate  the  answers  into  the  parse,  resulting  in  a parse  structure  much 
closer  to  the  meaning  of  the  sentence  than  a mere  syntactic  structure: 


Introduction 


4 


(IMP 


) 


ICLASSIFY  NN 

{ARGS  (LIST  THE  INPUT)! 

IPRCC  IF  IT  NN 

IARGS  IT) 

I ARGS  (LIST  THE  INITIAL)) 


The  parse  can  be  interpreted  as, 

Perform  a dassifxation.  The  argument  of  the  classification 
Is  the  input  list.  The  procedure  for  carrying  out  the 
classification  Is  to  test  If  the  Input  list  fits  the  Initial  list. 


the  parser’s  efficiency  should  not  depend  on  using  the  Interpreter  to  discontinue  a 
possible  parse  of  a sentence  on  semantic  grounds.  The  parser-interpreter 
Interface  should  only  be  asked  to  evaluate  parses  which  are  syntactically 
equivalent.  Two  partial  parses  are  syntactically  equivalent  if  both  will  lead  to  a 
successful  parse  on  the  same  sentence  endings,  or  if  the  end  of  the  sentence  has 
been  reached  and  each  Is  a successful  parse.  The  reason  for  this  decision  Is  that 
In  a rich  environment  we  would  expect  the  semantic  processing  required  to 
discontinue  a parse  to  be  more  expensive  than  the  syntactic  processing  required  to 
determine  that  the  parse  cannot  lead  to  a syntactic  Interpretation.  Woods,  In 
[Woods  73],  has  experimented  along  these  lines  and  found  that  (in  his  case)  "...It 
looks  ns  If  It  takes  longer  to  do  the  parsing  and  semantic  Interpretation  overall  If 
the  Interpretation  Is  done  during  the  parsing  than  It  does  If  the  parsing  Is  done  first 
and  the  Interpretation  afterwards."  Of  course,  semantic  processing  will  have  to  be 
done  to  determine  which  syntactic  parse  of  the  sentence  Is  most  meaningful;  the 
point  Is  that  we  wish  to  avoid  any  semantic  analysis  whose  effect  could  be 
achieved  through  syntactic  analysis. 


Introduction 


6 


The  assumption  about  the  relative  costs  of  semantics  and  syntactic  processing 
cannot  be  proved.  We  -sn  note,  however,  that  even  the  simplest  kinds  of  semantic 
checks  can  require  arbitrary  amounts  of  Inference  In  a general  system.  For 
example,  consider  the  decision  of  whether  a pair  of  words  ("street  lights",  for 
example)  is  a compound  noun,  or  a noun  followed  by  a verb.  At  first  glance,  It  would 
seem  that  this  could  be  cheaply  done  by  simply  checking  a marker  on  the  first  word 
("street"),  which  Indicates  whether  it  is  a suitable  subject  for  the  proposed  verb 
("lights").  However,  there  are  two  problems  with  this  approach.  One  is  that  simple 
markers  on  words  are  inadequate  for  dealing  with  the  problems  of  language.  Many 
words  can  be  modified  so  that  they  are  acceptable  subjects  for  verbs  which  are 
not  ordinarily  associated  with  them,  eg.,  "The  glowing  radioactive  street  lights  the 
way  for  ...".  The  process  of  determining  whether  a modified  noun  Is  a suitable 
subject  for  an  arbitrary  verb  seems  beyond  simple  iook-up  techniques.  The  second 
problem  is  that  even  if  the  potential  subject  Is  unmodified,  the  syntax  and  meaning 
of  the  remainder  of  the  sentence  may  constrain  the  behavior  of  the  ambiguous  pair 
to  be  the  oppc  ,ite  of  what  one  might  expect.  For  Instance,  "water  bolls"  would  be 
predicted  to  be  a noun-verb  pair,  yet  in  "Water  bolls  are  dangerous  parasites 
which  can  be  found  in  the  Great  Lakes.",  it  acts  as  a compound  noun.  It  should  also 
be  noted  that  occasionally  semantic  analysis  wlii  be  unable  to  act  as  a filter.  "Set 
X"  may  be  either  a noun-verb  pair  or  a noun  and  Its  apposltlve.  The  only  way  to  tell 
Is  to  know  the  syntactic  context  the  words  appear  In.  in  "Set  X to  the  empty  set.", 
"set"  acts  as  a verb;  in  "Set  X is  the  empty  set.",  "set"  acts  a noun. 


Introduction 


6 


1.2.2  The  Interpreter 

The  interpreter  must  be  abie  to  do  the  foiiowing: 

1.  Ask  questions  of  tno  user.  This  enables  the  system  to  clarify  actions  It  has 
taken  and  promp'  the  user  for  information  tD  has  omitted. 

2.  Understand  three  different  types  of  user  statements' 

User  statements  meant  as  steps  ir.  the  program.  These  are  translated  Into 
primitives  in  the  program  specification  language.  This  is  the  basic  method  for 
building  the  program  specification.  "Print  the  greatest  number  in  the  list" 
must  be  translated  into  an  "output"  primitive  with  an  argument  representing 
"the  greatest  number". 

User  statements  directed  as  meta  comments  about  the  dialogue.  These  are 
translated  into  case  frames  which  express  their  Intent.  This  allows  the  user 
to  control  the  fiow  of  the  dialogue,  "Ask  me  about  the  structure  of  the  ck  ' 
base  first."  must  be  interpreted  as  a request  for  a different  question,  rather 
than  part  of  the  program  being  written. 

FIneiiy,  some  user  statements  should  be  understood  as  general  comments 
about  the  program  rather  than  as  explicit  instructions  on  coding  it.  "The 
program  scores  and  retrieves  data."  is  meant  as  an  overall  description  of  a 
program,  not  Its  first  two  steps. 

3.  Identify  any  objects  and  actions  mentioned  by  the  usr-r  with  their  correct 
referent  In  the  progiam  specification,  if  the  user  says  "Aft»  printing  it,  print  the 


Introduction 


7 


list  containing  It.",  the  Interpreter  must  find  a referent  for  "It",  determine  which 
"list"  is  meant,  and  match  "printing  It"  to  the  appropriate  operation  In  the  program 
specification. 

4.  Use  the  qu^don  it  has  asked  to  aid  In  understanding  the  user’s  replies,  in 
processing  a description  of  two  data  structures,  which  are  referred  to  as  the 
"scene"  and  "concept",  "The  same  as  the  concept."  should  be  understood  to  mean 
"The  scene  has  the  same  structure  that  the  the  concept  has."  If  the  question 
asked  is  "What  Is  the  structure  of  the  scene?"  However,  the  system  must  also  be 
able  to  accept  more  information  (In  any  order)  than  Its  question  has  asked  for,  eg., 

What  Is  the  definition  of  the  predicate  "Reach"? 

A node  X Is  connected  to  a node  V If  there  exists  a pair  In  the 
graph  such  that  X and  Y are  In  the  pair.  X can  be  reached  from  Y 
If  X Is  connected  to  Y or  If  X can  be  reached  from  a node  which  Is 
connected  to  Y. 

5.  Learn  definitions  for  any  undefined  words  used  by  the  user.  If  the  system  Is  to 
be  robust,  It  must  be  able  to  Infer  certain  Information  about  words,  rathec  than 
depend  on  knowing  everything  In  advance.  In  the  example  above,  the  system 
Inferred  that  "connected"  Is  a binary  predicate  on  nodes.  If  It  Is  necessary  to 
preprogs^m  information  of  this  sort,  the  system  will  fall  every  time  an  unfamiliar  word 
is  used,  even  though  the  word  occurs  In  a context  In  which  Its  meaning  Is  apparent. 

6.  incorporate  Implied  Instructions  from  the  user  Into  the  program  specification  while 
avoiding  redundancy  If  the  same  instruction  Is  later  made  explicit.  Consider, 

1.  Print  the  result  of  the  test,  ask  the  user  If  this  Is  correct,  and 
read  In  the  user’s  response. 

versus 

2.  Print  the  result  of  the  test  and  ask  the  user  If  this  Is  correct. 


Introduction 


8 


In  both  1.  and  2.,  the  next  question  the  system  should  ask  Is  "What  Is  the  structure 
of  the  user’s  response?".  In  1.,  there  Is  an  explicit  Input  operation  mentioned.  In 
2.,  the  system  must  Infer  the  Input  operation  because  "ask"  Implies  both  an  output 
and  an  Input.  The  system  must  be  able  to  supply  an  input  for  case  2.,  but  realize 
that  the  user  has  already  mentioned  the  Input  for  case  1.  This  is  not  as  trivial  as 
just  checking  for  an  Input  after  every  output  generated  from  "asks",  since  If  the 
user  says, 

"Output  the  result  of  the  test  and  ask  the  user  if  this  cotrect. 

Then  read  In  another  test  Item.", 

the  system  must  still  ask  for  the  structure  of  the  user’ s response. 

7.  Use  a certain  amount  of  programming  knowledge  to  aid  In  Its  construction  of  the 
program  specification.  Understar.ding  many  of  the  user’s  replies  will  require 
specific  bits  of  programming  knowledge.  If  the  system  asks,  "What  is  the  exit  test 
of  the  loop",  and  the  user  replies,  "Stop  when  ’quit’  Is  typed",  the  Interpreter  must 
know  that  this  means  to  test  the  argumert  of  the  (presumably  one)  Input  operation 
In  the  loop  to  see  If  It  Is  "quit".  If  so,  the  loop  should  be  exited.  The  same 
Information  tells  the  Interpreter  that  the  test  should  be  Inserted  Into  the  program 
after  the  Input  operation. 


1.3  Three  Examples 

This  section  consists  of  three  brief  examples1  Intended  to  Illustrate  the  extent  of 
the  processing  done  by  the  system. 


i 


Every  example  In  this  paper  was  produced  by  the  system. 


Introduction 


9 


WHAT  iS  THE  NAME  OF  THE  FROGRAM  YOU  WISH  TO  WRITE? 

Write  me  a program  called  intersection  which  reads  a set  of  lists 
of  numbers  and  prints  th\<  numbers  which  are  in  all  cf  them. 


INTERSECTION 
At  - REA0O 

A4  »-  The  set  of  all  A2  in  all  A3  in  At  such  that: 

FORALL(Bl)  IMPL IES(MEMBER(B1  At) 

MEMBER(A2  Bl)) 

For  all  A2  in  A4  do: 

PRINT(A2) 

Bl  is  a variable  bound  to  A3. 

A3  is  the  generic  element  of  At. 

At  Is  a set  whose  generic  element  Is  a list  whose  generic  element 
Is  a number. 

Figure  1.1 

An  algorithm  description  and  its  program  specification 

The  top  section  of  Figure  1.1  contains  a description  (in  answer  to  the  system’s 
question)  of  a program  which  finds  the  Intersection  of  a set  of  lists  of  numbers.  The 
program  snecifiration  for  the  example  Is  shown  in  Figure  1.2  on  the  following  page. 
It  consists  of  a series  of  Interconnected  nodes  which  represent  the  various 
components  of  the  program.  Each  component  type  is  fully  described  In  Chapter  five. 
For  large  programs,  the  program  description  is  too  bulky  (and  generally  unreadable) 
to  exhibit,  so  a "pretty  printed"  version  of  it  will  be  shown  instead.  A simple 
program  Is  used  to  print  the  specification  as  an  Algol  like  control  structure  with  data 
descriptions  In  English.  The  result  of  printing  the  specification  In  Figure  1 .2  Is 
shown  beneath  the  algorithm  description  in  Figure  1.1. 


IfttiflKtthftl Mil 


Introduction 


10 


* type  PROCEDURE 
name  #« 


definition  * 


* type  SEQUENT iAL 
steps  *«-**«-»* 

‘ * L 


* type  INPUT 
args  * 


V 


* type  NAME 
value  INTERSECTION 


_r 


* type  COMPUTE 
result  


assert  ions  * 
on  # 


♦#  type  SET 
t element  * 


* type  LIST 
slement  * 


* type  NUMBER 


* type  ENUMERATE 
steps  * 
on  * 


* type  OUTPUT 
args  # 


<-**  type  SET 
element  # 


* type  BOUNO 
boundto  #«- 


* type  FDRALL 
predicate  *♦ 
bindings  * 

I 


U*  type  NUMBER 


I 

* type  IMPLIES 
antecedent  #♦ 
consequent  # 

I 


* tyro  MEMBER  | 

element  # ' 

set  * 

I 


* tupe  MEMBER 
sTe 


element 
set  * 

I 


Figure  1.2 

"Write  me  a program  called  Intereection  which  inpute  a 
set  of  liete  of  numbere  and  printe  the  numbere  which 
are  in  all  of  them." 


Introduction 


11 


The  relation  between  the  specification  and  Its  "pretty  printing"  is  apparent.  As  an 
example,  consider  the  printing  of  the  ENUMERATE  component.  ENUMERATES  are  the 
specification  primitive  for  performing  an  action  (the  STEPS  slot)  on  each  element  of 
a set  (the  ON  slot).  To  "pretty  print"  an  ENUMERATE  component,  the  printing 
program  merely  concatenates, 

For  ail 

<the  ELEMENT  of  <the  ON  of  ENUMERATE» 

In 

<the  ON  of  ENUMERATE) 
do: 

<the  STEPS  of  ENUMERATE) 

One  of  the  points  this  example  makes  Is  that  the  best  method  for  Implementing  the 
user’s  program  Is  not  necessarily  contained  In  the  program  specification.  If  the 
program  specification  were  Implemented  straightforwardly,  the  resulting  program 
would  be  grossly  Inefficient.  The  intersection  set  would  be  created  by  cycling 
through  every  list  In  the  Input  set,  rather  than  Just  the  first.  The  reason,  of  course, 
Is  that  the  user  has  asked  the  system  to  write  a program  which  finds  a set 
satisfying  some  assertions,  but  has  not  given  a method  for  determining  the  set. 
Therefore  the  system  falis  back  on  the  method  of  testing  each  element  In  the  set  to 
see  If  It  satisfies  the  assertions.  The  Interpreter  can  accept  more  process  oriented 
program  descriptions.  Figure  1.3  contains  a very  low  level  description  of  an 


Intersection"  program. 


Introduction 


12 




WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE? 

Intersection 

□ESCRIBE  INTERSECTION. 

It  reads  a list  X. 

WHAT  IS  THE  STRUCTURE  OF  THE  GENERIC  ELEMENT  OF  X? 

A list  of  of  numbers. 

ARE  WE  FINISHEO? 

Set  First  and  Output  to  the  head  of  X.  Let  Rest  equal  the  tail  of  X. 

If  First  is  empty,  print  Output  Bnd  return.  If  Rest  is  empty,  set 
First  equal  to  the  tail  of  First,  set  Rest  to  the  tail  of  X,  and  go 
back  to  the  first  conditional.  If  the  first  element  in  First  is  not  in 
the  head  of  Rest  then  remove  it  from  Output,  set  FirBt  to  its  tail, 
set  Pest  to  the  toil  of  X and  go  to  the  conditional.  Otherwise,  set 
Rest  to  the  tail  of  Rest  and  return  to  the  conditional. 


INTERSECTION 
X <-  REACH) 

FIRST  - HEAP(X) 

OUTPUT  - HEAD(X) 

REST  - TAIL(X) 

Labe  1 1 : If  EQUAL( FIRST  PHI) 

Then  PR  1 NT ( OUTPUT ) 

RETURN( ) 

If  EQUAL  ( REST  PHI) 

Then  FIRST  * TAIl( FIRST) 

REST  TA1L(X) 

GOTO  Labe  1 1 

If  NOT (MEMBER(  TEAtif F 1 RST ) 

HEAD( REST))) 

Then  OUTPUT  <-  REMOVE  ( HEA0(  F I RST ) OUTPUT) 
FIRST  - TAIL( F I RST) 

REST  *■  TAIL(X) 

GOTO  Labe  11 

else  REST  *■  TA I L ( REST ) 

GOTO  Labe  11 


REST  is  a list  whose  generic  element  Is  a list  whose  generic  element 
Is  a number. 


OUTPUT  Is  a list  whose  generic  element  Is  a number. 

FIRST  is  a list  whose  generic  element  Is  a number. 

X is  a list  whose  generic  element  Is  a list  whose  generic  e’ement  Is 
a number. 


Figure  1.3 

A low  level  description  of  Intersection 


iiiiuliHUulliliuiliiiiiuiiiUlUitiiluUlllliUiU 


Introduction 


13 


As  a prelude  to  chapter  five,  note  that  even  though  this  dialogue  (unlike  most  the 

system  handles)  translates  fairly  directly  Into  primitives  In  the  program 

specification,  there  are  still  several  natural  language  problems  embedded  In  It. 

Different  ways  of  specifying  the  same  action: 

Set  X equal  to  Y.  Go  back  to  X. 

Set  X to  Y.  Go  to  X. 

Let  X equal  Y.  Return  to  X. 

Reference  problems: 

...Its  tall. 

...remove  It  from  Output. 

...go  to  the  first  conditional. 

...the  conditional. 

...the  first  element  In  First. 

Language  conventions: 

Otherwise,  set... 


The  Interpreter  can  handle  more  "structured"  low  level  descriptions.  Figure  1.4 
contains  an  example. 


Introduction 


14 


WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE? 

Intersection 

□ESCRIBE  INTERSECTION. 

It  reede  e list  X.  X ie  a list  of  lists  of  numbsrs.  Lst  First  squat 
the  heed  of  X end  Output  equal  First.  While  First  is  not  empty,  sst 
Rest  to  the  tail  of  X.  Then  while  Reet  is  not  empty,  if  the  head  of 
First  is  not  e member  of  the  head  of  Reet,  remove  it  from  Output  and 
set  Rest  to  empty.  Otherwise  eet  Rest  to  the  tail  of  Rest. 

ODES  THE  SECONO  LOOP  BELONG  IN  THE  TOP  LEVEL  OF  INTERSECTION  OR  IN 
THE  TOP  LEVEL  OF  THE  FIRST  LOOP? 

In  the  loop. 

ARE  WE  FINISHEO? 

After  the  eecond  loop,  eat  First  to  the  toil  of  First. 

ARE  WE  FINISHEO? 

Print  Output  after  the  fir et  loop. 


INTERSECTION 
X - READ(  ) 

FIRST  - HEAP(X) 

OUTPUT  <-  FIRST 

While  NOT(EOUAL(F IRST  PHI))  do: 

REST  - TAIL(X) 

While  NOT ( EQUAL ( REST  PHI))  do: 

IF  NOT(MEMBER(HEAD  FIRST) 

HEA0( REST ) ) ) 

Then  OUTPUT  - REMOVE (HEAD( FIRST)  OUTPUT) 

REST  - PHI 

else  REST  *■  TAIL(REST) 

FIRST  - TAIL(FIRST) 

PRINT(OUTPUT) 

REST  Is  a list  whose  generic  element  is  a list  whose  generic  element 
Is  a number. 

OUTPUT  Is  a list  whose  generic  element  Is  a number. 

FIRST  Is  a list  whose  generic  element  Is  a number. 

X Is  a list  whose  generic  element  is  a list  whose  generic  element  Is 
a number. 


Figure  1.4 

A more  structured  Intersection  program 


Introduction 


16 


In  general,  the  program  descriptions  the  Interpreter  Is  asked  to  handle  will  be  a 
cross  between  high  level  descriptions  like  the  first  dialogue  and  low  level 
descriptions  like  the  second  two.  The  dialogues  In  Appendix  A provide  further 
examples  of  this. 

In  the  dialogue  from  Figure  1.4,  the  interpreter  had  to  ask  the  user  whether  the 
second  loop  was  embedded  In  the  first.  More  programming  knowledge  would  have 
supplied  the  answer  for  the  Interpreter.2  It  should  have  been  obvious  that  the 
description  of  the  first  loop  was  Incomplete,  since  the  exit  test  checked  the  value 
of  variable  whose  value  remained  unchanged  Iri  the  loop.  Such  knowledge  Is  beyond 
the  scope  of  the  present  parser/interpreter  project.  Instead,  It  Is  made  available 
to  the  Interpreter  via  the  PSI  system  [Green  76]. 


1.4  PS! 


The  parser/interpreter  has  been  designed  to  run  as  a part  of  the  PSI  automatic 
program  synthesis  system.  The  PSI  system,  which  Is  being  written  as  a group 
project  at  the  Stanford  University  Artificial  Intelligence  Laboratory,  consists  of  a 
number  of  different  modules,  one  of  which  Is  the  parser/interpreter  system. 
Together,  the  parser/interpreter  and  the  other  PSI  modules  form  a complete 
automatic  programming  system. 

The  most  obvious  addition  supplied  by  the  PSI  system  Is  the  coding  and  efficiency 
module  which  Is  Intended  to  produce  optimized  LISP  or  SAIL  code  from  the  program 


o 

As  we  have  mentioned,  the  Interpreter  has  some  programming  knowledge;  for 
Instance,  It  knows  enough  to  know  It  doesn’ t know  where  the  loop  goes. 


aim 


Introduction 


16 


specification.  Thus  the  user  Is  encouraged  to  use  a very  high  level  description  for 
his  program  since  the  specification  specifies  the  performance  of  the  desired 
program,  but  not  Its  implementation.  [Barstow  77]  and  [Kant  77] 

The  remaining  modules  In  PSI  help  support  the  the  dialogues  run  by  the 
pa'ser/interpreter.  The  parser/interpreter  can  run  independently  of  then.,  but  its 
performance  Is  weak  (or  nonexistent)  in  the  areas  these  modules  were  designed 
for.  The  other  PSi  Modules  are: 

An  English  generator  being  developed  by  Richard  Gabriel.  The  generator 
should  not  be  confused  with  the  English  data  description  printer  used  in 
pretty  printing  the  program  specification.  The  data  description  printer  uses  a 
"fill  In  the  blanks"  paradigm  (X  is  a Y with  Z whose  Q etc.),  which  is  adequate 
for  Its  purposes.  The  completed  PS!  generation  system  will  include  d program 
explanation  module  which  wiii  displace  the  data  description  printer. 

A programming  knowledge  module.  This  module  is  responsible  for  checking 
the  consistency  of  the  program  specification,  suppling  questions  to  be  asked 
in  case  of  Inconsistencies,  and  answering  questions  whose  answers  can  be 
derived  from  information  about  programming.  [McCune  77] 

A domain  knowledge  module  which  is  being  written  by  Jorge  Phillips.  This 
module  is  analogous  to  the  programming  knowledge  module  except  that  It  has 
information  about  the  specific  type  of  program  written,  as  opposed  to 
programming  in  general,  it  might  know,  for  Instance,  that  in  a text  editing 
domain,  when  the  user  says  "exit  the  file",  he  means  "write  aii  the  changes 


made  onto  the  disk  and  then  exit  the  file. 


Introduction 


1 7 


A traces  and  examples  module  which  enables  the  user  to  describe  his 
program  In  terms  of  examples  and  traces  as  well  as  English.  [Phillips  78] 

A dialogue  moderater  which  coordinates  the  various  PSI  modules,  chooses 
which  question  to  ask  the  user  next,  and  processes  the  user' s comments 
about  the  dialogue  supplied  to  It  by  the  parser/interpreter.  [Steinberg  78] 


1,6  An  Overview 

1.5.1  Reader 

Reader  can  be  briefly  described  as  a left  to  right  parser  that  uses  a combination  of 
top-down  and  bottom-up  strategies.  The  method  used  at  any  point  In  a parse  Is 
determined  by  the  grammar  writer.  The  grammar  consists  of  a set  of  Lisp  programs 
which  manipulate  the  data  structures  and  data  structure  building  primitives  supplied 
by  the  parser. 

Reader  Is  able  to  efficiently  recognize  a large  subset  of  English  because  It  seldom 
needs  to  maintain  more  than  one  possible  parse  of  a sentence.  It  should  be 
stressed,  however,  that  Reader  Is  not  completely  deterministic3.  Complete 
determinism  does  not  seem  possible  when  dealing  with  a large  grammar  and 
vocabulary  In  which  most  words  can  fulfill  more  than  one  syntactic  role. 

The  characteristics  which  allow  Reader  to  parse  nearly  deterministically  are  listed 


q 

Almost  all  the  nondeterminism  arises  from  words  which  belong  to  more  than  one 
word  class;  eg.,  If  a word  can  act  as  either  a verb  or  a noun,  Reader  must  try  both 
possibilities  separately. 


Introduction 


18 


below.  In  Section  3 .2,  these  characteristics  are  divided  into  essentially  three 
different  categorios. 

1.  A sentence  constituent  Is  only  built  when  the  parser  knows  that  there 
Is  at  least  one  other  constituent  that  has  already  been  built  that  can 
accept  the  first  as  r modifier. 

2.  A constituent  is  attached  (ie..  proposed  as  a modifier)  to  another 
constituent  only  when  the  attachment  is  forced  by  the  syntax  of  the 
sentence.  A slmpie  example  of  "delayed  attachment"  occurs  in  the 
sentence.  "The  program  called  Intersection. The  constituent  "called 
intersection"  is  not  attached  to  'the  program"  until  the  words  following 
"Intersection"  require  that  the  attachment  be  made. 

3.  Because  of  2.,  when  a constituent  s attached  to  another,  the  parser 
generally  knows  the  reason  for  the  attachment,  and  can  use  that  reason 
to  guide  it  in  making  the  attachment.  For  Instance,  in  "The  program 
called  Intersection  was  written  by  George.",  "was"  forces  "called 
Intersection"  to  be  attached  to  "The  program".  The  reason  for  the 
attachment  Is  to  allow  "The  program"  to  be  hie  subject  of  "was",  so  It  is 
clear  that  "called  Intersection"  Is  to  be  attached  as  a relative  clause 
modifying  "proqram",  since  if  it  were  attached  as  the  main  verb,  there 
would  be  no  place  to  put  "was".  In  "The  program  called  Intersection  and 
returned  ",  when  "and  returned"  Is  read,  the  parser  knows  that  the 
clause  "called  Intersection"  must  be  an  active  construction  (as  opposed 
to  the  passive  construction  which  leads  to  the  relative  clause 
Interpretation)  so  that  it  can  be  attached  to  "The  program"  as  the 
predicate  of  the  sentence. 

4.  The  parser  uses  one  syntactic  structure  to  represent  more  than  one 
possibility,  in  "The  program  called  Intersection  ...",  the  structure  "called 
Intersection"  simultaneously  represents  the  predicate  of  the  sentence 
and  a relative  clause.  Which  interpretation  to  use  is  determined  after 
more  of  the  sentence  had  been  read. 

5.  The  parser  provides  for  local  ambiguity  In  the  parse  structure  that  it 
returns.  For  instance,  "I  know  that  ice  is  dangerous"  could  mean  either 
"/  know  ice  is  dangerous."  or  "I  know  that  that  (particular)  ice  is 
dangerous".  The  parser  finds  both  interpretations  following  a single 
parse  path,  and  continues  following  a single  path  after  the  ambiguity  has 
been  reached  by  preparing  an  output  structure  in  which  the  subject  of 
"is"  Is  a choice  between  "that  ice"  and  "ice". 


As  we  have  Indicated,  occasionally  Reader  must  pursue  more  than  one  parse  path  at 
a time.  To  avoid  analyzing  the  same  sentence  constituent  each  time  It  Is 


Introduction 


19 


encountered  on  a different  parse  path,  Reader  uses  a variation  of  the  well-formed 
substring  table  Idea  (section  4.4).  This  enables  a constituent  which  has  been 
analyzed  to  be  effectively  shared  by  each  parse  path  that  can  use  It. 

The  parser-interpreter  Interface  Is  only  called  to  rate  structures  which  are  about  to 
be  attached  to  other  structures.  Structures  are  attached  to  other  structures  only 
when  the  syntax  of  the  sentence  forces  the  attachment.  These  two  facts  Imply 
that  the  parser-interpreter  Interface  will  only  be  asked  to  evaluate  those  parses 
which  are  syntactically  equivalent4.  For  a simple  example  of  this,  consider  "The 
number  In  the  list  the  program  printed  was  ..."  "Was"  forces  the  "The  number",  "In 
the  list",  and  "the  program  printed"  to  be  attached  to  one  anotner  for  the  purpose 
of  allowing  "The  number"  to  be  the  subject  of  "was".  The  parser-interpreter 
Interface  must  choose  from  between  structures  which  represent  the  meanings  "The 
number  which  was  printed  and  in  the  list."  and  "The  number  which  was  In  the 
printed  list."  Since  each  structure  plays  the  same  syntactic  role,  namely  that  of  a 
noun  group,  any  sequence  of  words  following  "was"  will  lead  to  a parse  for  either 
both  or  neither  of  the  two  Interpretations. 

Reader's  Interface  with  Its  Interpreter  Is  a program  called  Format  which  rates  each 
syntactic  structure  built  by  Reader  before  It  Is  attached  to  another.  The  criteria 
measured  by  the  Interface  are: 

1.  Does  the  verb  of  the  structure  (If  there  Is  one)  have  enough  of  Its 
cases  filled  In  to  properly  specify  the  action  It  represents?  For 
example,  the  verb  "put"  requires  a case  which  specifies  where  the 
object  cl  "put"  was  put. 

2 Flow  appropriate  are  the  noun  groups  In  the  structure?  For  Instance, 
the  noun  group  "water  bolls"  would  be  Judged  Inappropriate. 


4 Two  parses  are  syntactically  equivalent  If  and  only  If  the  end  of  the  sentence 
has  been  reached  and  both  are  successful  parses,  or  If  both  will  lead  to  a 
successful  parse  on  the  same  sentence  endings. 


‘ntroductlon 


20 


3.  How  anrroprlate  are  the  contents  of  the  cases  of  the  structure's 
verb.  For  Instance,  "street"  Is  an  Inappropriate  subject  for  "light". 

The  results  of  the  rating  are  used  to  pick  the  most  meaningful  structure  from  among 
equivalent  syntactic  possibilities.  Structures  which  evaluate  poorly  can  still  be 
Included  In  the  parse  of  the  sentence,  as  long  as  there  are  no  other  parses  which 
contain  structures  with  be’ier  evaluations.  The  parse  of  "Water  bolls  are  very 
small."  contains  the  "Inappropriate"  noun  group  "water  bolls",  since  there  Is  no 
syntactic  Interpretation  of  the  sentence  which  does  not  use  "water  bolls"  as  a noun 
group. 


1.6.2  The  Interpreter 

This  section  briefly  touches  on  reference  and  concept  matching,  two  of  the 
subjects  mentioned  In  section  1.2.2,  as  an  Introduction  to  the  methods  used  by  the 
Interpreter.  They  have  been  singled  out  because  they  are  the  basis  of  all  higher 
level  Inferences  performed  by  the  Interpreter.  Chapter  6 covers  much  more  In 
greater  detail. 

The  Interpreter’ s primary  means  of  understanding  user  statements  Is  via  a set  of 
case  frames  and  concepts.  The  case  frames  map  English  verbs  and  their  modifiers 
Into  the  concepts,  which  can  then  be  Incorporated  Into  the  program  specification. 
T a simplified  example,  consider  the  concept  of  an  Input  operation,  denoted 
#IMPUr.  For  now,  we  will  assume  that  #INPUT  takes  has  descriptors,  Its  arguments 
(ARGS),  Its  place  In  the  program  specification  (STEPOF),  and  the  Input  device 


(DEViC.-,. 


Introduction 


21 


iINPUT 

2HTYPE 

DESCRIPTORS:  ARGS  (Isa  iDATA) 

CASES:  SUBJECT  DEVICE 

STEPOF  Isa  NALG) 

OBJ  ARGS 

DEVICE  ( isa  iDEVICE) 

ISA  iINPUT 

DEF  INITION-OF  TYPE 

Figure  1.5 

A concept  and  a definition  which  can  be  mapped  to  it. 

Figure  1.5  shows  the  concept  and  a definition  of  "type"  which  can  be  mapped  to  It. 
The  definition  says  that  If  we  have  an  Instance  of  the  verb  "type",  and  its  cases 
(as  determined  by  the  parser)  can  be  mapped  successfully  (le.,  the  contents  of  the 
cases  satisfy  the  criteria  in  the  descriptors  of  the  #INPUT),  then  we  can  view  the 
verb  and  its  cases  as  an  instance  of  the  #INPUT  concept  and  take  the  appropriate 
action.  Concepts  can  represent  more  than  a single  primitive  in  the  program 
specification  language.  For  Instance,  "request"  in  "i’ii  request  a story  by  giving  a 
key  word."  maps  into  an  #INTERCHANGE  concept  which  Involves  an  INPUT  and 
OUTPUT  operation  with  a calculation  of  what  should  be  output  in  between. 

Noun  and  pronoun  references  is  facilitated  by  the  context  supplied  by  the  selection 

criteria  of  the  descriptors  of  a concept,  in, 

"It  reads  in  a triai-item,  matches  the  input  to  the  internal  concept 
model,  and  prints  the  result  of  the  match." 

a referent  must  be  found  for  the  noun  "input".  There  are  two  possibilities:  the 
INPUT  created  by  the  "read",  and  the  triai-item  which  Is  the  argument  of  the  "iead". 
S'oce  "match"  Is  mapped  to  a concept  (#PREDICATE)  which  'equires  that  its  ARGS 
descriptor  be  a #DATA  (rather  than  an  #ALGORiTHM  like  the  "read")  the  ambiguity  Is 


resoi'/ed. 


Introduction 


22 


When  the  choice  among  possible  referents  cannot  be  decided  on  the  basis  of  the 
very  general  type  checking  outlined  above,  more  situational  checks  are  needed. 
Consider, 

"it  reads  a list  of  numbers  and  a list  of  strings.  If  X Is  in  the  list 

then..." 

There  are  two  referents  for  "the  list";  the  number  list  and  the  string  list.  Since 
they  both  satisfy  the  selectional  criteria5  for  the  second  argument  of  the  #MEMBER 
"is  in"  maps  Into,  something  more  context  dependent  Is  needed.  Each  concept  has 
a second  layer  of  selectional  requirements  which  are  caiied  when  simple  type 
checking  fails  to  narrow  down  the  field  of  choices  sufficiently.  For  #MEMBER,  the 
check  succeeds  if  the  first  argument  has  the  same  type,  or  is  referred  to  In  the 
same  way,  as  the  generic  element  of  the  second  argument.  So  In  the  example,  If  X 
were  a string,  "the  list"  would  be  matched  to  the  string  list,  and  if  X were  a number, 
"the  list"  would  be  matched  to  the  number  list. 

in  the  event  of  a referent  whicii  remains  ambiguous  after  aii  tests  have  been 
applied,  the  time  honored  method  of  falling  back  on  the  most  recently  mentioned 
possibility  is  used.  Hopefully,  tiie  speaker  has  feit  free  to  use  a pronoun  In  an 
ambiguous  situation  because  the  referent  he  had  In  mind  was  the  most  recently 
mentioned  possibility. 


5 


They  are  both  sets. 


23 


2.  Parsing 

Natural  language  processing  begins  with  parsing.  Determining  the  meaning  of  a 
sentence  requires  knowing  the  main  verb  of  ihe  sentence  and  how  the  rest  of  the 
words  In  the  sentence  relate  to  It.  In  this  system,  for  example,  the  mapping  of  the 
sentence  "Print  the  list."  Into  a structure  which  is  an  OUTPUT  operation  whose 
argument  Is  the  referent  of  list  Is  dependant  on  knowing  that  <he  main  verb  of  the 
sentence  Is  "print",  the  syntactic  object  of  "print"  Is  "the  list",  and  the  sentence  Is 
an  Imperative. 


2.1  The  Basic  Algorithm 

A parser  allows  one  to  store  and  utilize  the  Information  about  sentence  structure 
needed  to  Interpret  sentences  properly.  The  Information  that  Is  stored  Is  referred 
to  as  the  grammar,  while  the  methods  for  applying  the  grammar  to  a particular 
sentence  are  usually  thought  of  as  the  parser.  Reader  Is  organized  somewhat 
differently  from  most  parsers1  In  that  Reader  Is  not  syntax  directed.  Writing  a 
grammar  for  Reader' consists  of  specifying  the  processes  which  build  the  structure 
of  an  input  sentence.  Thus  the  grammar  writer  specifies  how  the  grammar  Is 
actually  applied  to  a sentence,  as  well  as  the  grammar  Itself.  Reader's  function  Is 
to  provide  the  data  structures  the  grammar  Is  Intended  to  use,  the  control  structure 
which  activates  the  grar  mar,  and  programs  for  manipulating  the  data  structures. 

The  two  basic  data  structures  that  Reader  supplies  are  the  modifier  list  and  the 


The  parsers  of  Wlnograd  and  Rlesbeck  are  also  exceptions.  See  section  4.5. 


i 


Parsing 


24 


stack.  The  modifier  list  Is  a list  that  the  grammar  writer  can  use  to  store  words 
whose  use  has  not  yet  been  determined.  The  stack  Is  used  to  store  the  structure 
built  up  while  the  parse  Is  In  progress.  The  next  section  describes  the  stack  In 
detail.  A stack,  a modifier  list,  c message  about  what  has  just  happened  to  the  top 
of  the  stack,  and  a message  concerning  the  entire  stack  constitute  a partial  parse. 
The  top  of  the  stack  message  is  usually  a Lisp  atom,  eg.,  message  = NOUN,  VERB,  or 
CONJUNCTION  means  that  a noun,  verb  or  conjunction  has  just  been  added  to  the 
top  structure  In  the  stack.  The  stack  message  Is  a list  of  features  that  the  stack 
has.  Each  feature  Is  represented  by  an  atom.  Example  features  are  "the  stack 
contains  a verb  structure  with  a verb  that  can  accept  a clause  as  one  of  Its  cases" 
and  "the  stack  represents  a sentence  which  Is  an  interrogative". 

The  parse  Is  performed  by  adding  each  word  in  the  input  (going  from  left  to  right)  to 
the  partial  parse  formed  by  the  addition  of  the  previous  words  in  the  sentence.  The 
first  word  In  the  sentence  Is  applied  to  "the  initial  partial  parse",  which  consists  of 
the  "initial  stack"  (a  stack  containing  a single  structure  which  will  eventually  hold 
the  main  verb  of  the  Input  sentence),  and  an  empty  modifier  list.  The  "top  of  the 
stack  message"  for  the  Initial  stack  is  BEGIN,  and  the  message  concerning  the 
entire  Initial  stack  Is  NIL,  meaning  that  the  stack  has  not  acquired  any  features  yet. 

The  process  of  adding  words  to  the  partial  parse  Is  controlled  by  the  grammar.  The 
grammar  consists  of  a set  of  programs,  one  for  each  syntactic  word  class2,  which 
contain  the  rules  and  conditions  whlcn  specify  when  and  how  to  add  a particular 
word  class  to  a partial  parse  In  a given  configuration.  In  general,  there  may  be  more 


2 the  word  classes  the  parser  uses  are  VERB,  PREPOSITION,  NOUN,  MODIFIER, 
ARTICLE,  CONJUNCTION,  and  PUNCTUATION. 


Parsing 


25 


titan  ont  way  a word  class  can  be  added  to  a partial  parse.  It  Is  also  true  that 
many  words  belong  to  more  than  one  word  class.  For  Instance,  the  word  "like"  can 
be  a noun  ("His  likes  are  different  than  mine."),  a verb  ("She  likes  him."),  a 
preposition  ("a  man  like  him."),  a conjunction  ("He  plays  like  Jack  used  to."),  or  a 
modifier  ("men  of  like  temperament.").  These  two  facts  (a  word  may  be  added  to  a 
partial  parse  In  more  than  one  way,  and  a word  may  belong  to  more  than  one  word 
ciass)  impiy  that  the  parser  should  be  able  to  handle  more  than  one  partial  parse  of 
the  Input  at  a time.  However,  it  should  be  kept  In  mind  that  one  way  to  achieve  an 
efficient  parsing  process  Is  to  write  a grammar  which  minimizes  the  number  of 
possible  parses  the  parser  has  to  follow  at  once,  while  at  the  same  time  writing  a 
set  of  rules  which  adequately  express  English  syntax.  Section  3.2  shows  some  of 
the  methods  used  by  Reader's  grammar  to  avoid  a multiplicity  of  partial  parses. 

The  partial  parses  are  placed  on  a list  called  the  "partial  parse  list".  The  parser’s 
control  structure  Is  as  follows: 

1 . sentence  •-  the  list  of  words  comprising  the  input  sentence. 

2 partial-parse-list  *•  a list  of  the  initial  partial  parse. 

3.  WHILE  sentence  DO 

4.  Apply  the  next  word  in  sentence  to  each  partial  parse  In 
partial-parse-llst,  using  the  program  associated  with  each 
word  class  the  word  belongs  to. 

5.  Reset  sentence  by  removing  the  first  word  In  It. 

6.  Reset  partial- parse-ilst  to  a list  of  the  partial  parses  formed 
in  step  4. 

7.  Output  partial- parse-list. 


Step  5.  does  not  Imply  that  the  grammar  programs  cannot  look  ahead  In  tire  Input 


Parsing 


26 


sentence  and  use  more  than  one  word  at  a time,  If  a grammar  program  continues  a 
partial  parse  P by  applying  the  first  n (n  > 1)  words  In  sentence  to  It,  a message  Is 
left  which  prevents  the  next  n - 1 words  from  being  applied  to  P.  This  presentation 
of  the  control  structure  Is  accurate  with  the  exception  that  steos  6.  and  7.  are  a 
bit  more  complex  than  they  have  been  made  to  appear.  They  will  be  explained  In 
more  detail  In  later  sections. 

The  control  structure  Indicates  that  the  parallel  processing  Is  Invisible  to  the 
giammar  writer.  This  means  that  In  writing  the  grammar  programs,  the  grammar 
writer  need  only  concern  himself  with  one  stack  and  one  modifier  list,  since  each 
grammar  program  Is  caiied  on  each  partial  parse  In  parti  a! -parse- II  si  In  turn. 


2.2  Stack  structures  and  collapsing 

The  stack  is  the  major  data  structure  that  Reader  uses.  Its  function  Is  to  store  the 
structures  built  up  during  the  parse  until  it  is  decided  how  the  structures  should  be 
attached  to  one  another.  This  treatment  allows  for  easy  handling  of  a certain  type 
of  ambiguity  that  arises  frequently  In  English  utterances. 

Consider  the  sentence,  "I  had  another  look  at  it".  It  can  mean  either  "/  asked 
someone  else  to  look  at  it"  or  "/  took  one  more  look  at  it".  The  ambiguity  arises  from 
the  different  uses  of  "had",  "look"  and  "another"  In  each  interpretation. 

The  sentence  "John  spoke  to  the  man  with  Bill"  Is  ambiguous  in  a different  way.  It 
might  mean  "John  and  Bill  spoke  to  the  man."  or  "John  spoke  to  the  man  who  was 
with  Bill."  In  this  sentence  the  ambiguity  derives  from  the  fact  that  "with  Bill"  can 


Parsing 


27 


i 

i 

r 

i 

| 

I 


be  used  to  specify  either  who  acted  with  John,  or  who  was  near  the  man.  In  each 
meaning,  tne  words  of  the  sentence  have  been  used  In  the  same  fashion. 
Ambiguities  of  this  sort,  one  constituent  of  an  utterance  being  a possible  modifier 
for  more  than  one  word  In  the  utterance,  have  been  referred  to  as  "permanent 
predictable  ambiguities"  In  [Sager  73], 


The  stack  allows  Reader  to  handle  ambiguities  of  the  second  kind  by  allowing  for 

the  structuring  of  most  of  the  constituents  of  the  sentence  before  It  is  decided 

which  words  they  will  modify.  The  elements  of  the  stack  are  called  stack 

structures.  Two  different  types  of  stack  structures  are  employed  by  Reader: 

preposition  structures  and  verb  structures.  The  sentence  "John  lost  the  toy  he 

bought  In  the  woods  on  Sunday."  would  be  parsed  into  the  following  stack: 

4. [on  Sunday] 

3.  [In  the  woods] 

2.  [he  bought] 

1 . [John  lost  the  toy] 

1.  and  2.  would  be  represented  by  verb  structures  and  3.  and  4.  by  preposition 
structures.  Verb  and  preposition  structures  can  be  filled  In  as  follows: 

Verb  structures  Preposition  structures 


noun3 

noun 

noun2 

prep 

nouni 

adverbs 

verb-group 

measure 

adverbs 

message 

cases 

function 

measure 

message 

The  noun  slots  are  filled  by  noun  groups.  A noun  group  consists  of  a list 
of  the  head  noun  followed  by  Its  modifiers.  A verb  may  have  one,  two  or 
three  of  Its  noun  slots  filled.  A preposition  may  have  Its  noun  slot  filled 
or  not. 

The  verb-group  slot  Is  filled  by  a list  of  verbs.  Each  verb  consists  of  a 
root  and  an  ending. 


i 


HMMM 


Parsing 


28 


The  adverbs  slot  Is  filled  by  a list  of  modifiers  of  the  verb  group  or 
preposition. 


The  cases  slot  Is  filled  by  the  cases  the  verb  has  that  are  Introduced 
by  prepositions  and  conjunctions. 

The  function  slot  contains  the  function  of  the  verb  structure.  MAIN  is 
used  to  Indicate  that  a verb  structure  holds  the  main  verb  of  an 
utterance,  RC  Indicates  a verb  structure  Is  being  used  as  a relative 
clause,  etc. 


The  prep  slot  holds  the  preposition  of  a preposition  structure. 

The  message  slot  contains  information  relevant  to  the  stack  structure. 
Its  contents  are  controlled  by  the  grammar.  We  will  see  examples  of  its 
uses  when  we  discuss  the  grammar. 

The  measure  slot  contains  the  parser’s  rating  of  each  structure.  The 
rating  is  used  to  help  the  parser  choose  among  competing  parses.  It  will 
be  defined  In  section  4.1. 


Throughout  this  paper,  o'ack  structures  will  be  printed  as  a collection  of  slot-value 
pairs.  Empty  slots  will  not  be  printed.  Under  this  scheme,  the  stack  for  the 
sentence  above  would  be  printed  as 


PREP:  ON 
4.  NOUN:  SUNDAY 

PREP:  IN 

3.  NOUN:  (WOODS  THE) 

VERB:  ((BUY  ED)) 
NOUNI:  HE 
2.  FUNCTION:  RC 

VERB:  ((LOSE  ED)) 
NOUNI:  JOHN 
NOUN2:  (TOY  THE) 

1.  FUNCTION:  MAIN 


John  lost  the  toy  he  bought  In  the  woods  on  Sunday. 


Parsing 


20 


The  stack  could  be  interpreted  In  several  different  ways: 

a.  John  lost  a foy.  He  bought  It  In  the  woods.  He  bought  It  on  Sunday. 

b.  John  lost  a toy.  He  bought  It  In  the  woods.  He  lost  It  on  Sunday. 

c.  John  lost  a toy.  The  toy  was  lost  on  Sunday.  It  was  lost  In  the  woods. 
John  bought  the  toy. 

efc. 


The  process  of  determining  which  of  the  Interpretations  was  actually  intended  by 
the  speaker  Is  referred  to  as  collapsing  the  stack,  since  finding  the  correct 
Interpretation  of  the  stack  consists  of  red„_.ng  the  stack  to  one  stack  structure.  If 
we  accept  meaning  c.  as  the  proper  interpretation  of  the  above  sentence,  then  the 
single  stack  structure  that  represents  that  meaning  of  the  stack  Is 

VERB:  ((LOSE  ED)) 

NOUNI  s (TOY  THE  {BUY  PN  [SUB  HE]  )) 

NOUN2:  JOHN 

CASES:  ((WHERE  (IN  (WOODS  THE))  (WHEN  (ON  SUNDAY))) 

FUNCTION:  MAIN 

where  "he  bought"  specifies  which  toy,  "on  Sunday"  specifies  when  the  toy  was 
lost,  and  "In  the  woods"  specifies  where  the  toy  was  lost. 

The  parser  must  consult  with  Its  deductive  system3  during  a Collapse  of  the  stack. 
The  reason  that  the  third  meaning  seems  to  be  right  Is  that  one  Is  unlikely  to  buy  a 
toy  In  the  woods,  since  there  usually  aren’t  any  stores  located  In  the  woods.  The 
parser  also  needs  to  know  that  Sunday  is  a possible  date  rather  than  a location  for 
the  woods.  There  Is,  however,  some  syntactic  knowledge  embedded  In  the  stack. 
The  parser  never  considers, 

The  deductive  system  for  the  Reader/interpreter  system  Is  the  Interpreter.  In 
discussing  the  parser  in  general,  we  wlii  use  "its  ded  uctive  system"  to  mean  the 
program  which  calls  the  parser  and  is  able  to  reason  about  the  subject  domain  of 
the  sentences  being  parsed. 


Parsing 


30 


d.  A toy  was  lost  In  the  woods  by  John.  John  had  bought  the  toy. 

The  toy  was  bought  on  Sunday. 

as  a possible  meaning  for  the  sentence  since  d.  requires  that  stack  structure  4. 
modifies  2.,  while  3.  modifies  1.  English  syntax  does  not  allow  such  crossovers,  so 
the  parser  never  has  to  consider  d.  as  a possible  meaning. 

The  communication  channel  between  the  parser  and  the  Interpreter  Is  a function 
named  Format.  Format  is  called  to  evaluate  a structure  Just  before  it  is  attached  to 
another  structure  during  a Collapse.4  The  algorithm  used  by  Collapse  ensures  that 
once  a structure  has  been  attached  to  another,  it  cannot  be  modified  (le.,  have 
another  structure  attached  to  It).  Formatting  serves  the  dual  purpose  of  preparing 
a structure  for  output,  and  providing  the  deductive  system  with  an  opportunity  to 
rate  the  iikellhood  that  the  speaker  intended  the  words  in  the  structure  to  be 
grouped  with  each  other.  The  rating  of  a formatted  structure  is  merged  with  the 
contents  of  the  measure  siot  of  the  structure  it  is  being  attached  to.  Thus  the 
measure  slot  of  a structure  contains  the  ratings  of  all  the  structures  that  have 
been  attached  to  that  structure.  The  measure  of  a structure  is  discussed  in 
section  4.1 . 

Coiiapse  chooses  which  one  of  the  possible  stack  structures  the  stack  could  be 
coiiapsed  to  by  picking  the  structure  with  the  best  measure.  If  there  Is  more  than 
one  partial  parse  active  at  the  end  of  the  sentence,  Reader  returns  the  one(s) 
whose  coiiapsed  stacks  have  the  best  measure.  The  format  of  a preposition 
structure  Is  Its  measure  and  a ilst  of  the  preposition,  adverbs  and  noun;  the  format 
of  a verb  structure  Is  its  measure  end  a list  of  the  root  of  the  main  verb,  the  tense 


4 Format  is  also  caiied  evaluate  the  final  structure  obtained  from  the  parsing 
process. 


Parsing 


31 


of  the  verb  group,  the  verb’s  adverbs,  and  the  verb's  cases,  Measure  is  only  used 
to  select  from  among  syntactically  equivalent  parses,  so  If  the  only  reading  a 
sentence  admits  results  in  a bad  measure,  a parse  wlli  be  found  anyway. 

When  the  stack  for  "John  lost  the  toy  he  bought  In  the  woods  on  Sunday  " is 
collapsed,  the  measure  of  any  resulting  structure  which  includes  structure  3.  (in  the 
woods)  attached  to  structure  2.  (he  bought),  will  be  worse  than  those  that  don’t, 
since  the  measure  of  structure  2.  modified  by  structure  3.  will  be  "unacceptable" 
(see  section  4.1)  since  the  parser’s  deductive  system  would  "know"  that  "the 
woods"  does  not  satisfy  the  requirements  that  "buy"  has  for  places  where  one  can 
buy  things.  Section  5.5  explains  how  this  "know"  Is  Implemented  In  the 
Reader/Interpreter  system. 


We  can  now  mention  the  complication  referred  to  In  step  7.  of  the  control  structure 
presented  in  section  2.2.  Step  7.  was  originally  "Output  the  list  of  partial  parses". 
What  really  happens  Is  that  Reader  collapses  the  stacks  associated  with  each 
partial  parse,  each  structure  resulting  from  the  collapse  Is  formatted,  and  then 
Reader  then  outputs  a list  of  the  formatted  structure(s)  with  the  best  measure. 


There  are  two  points  about  the  stack  which  should  be  emphasized: 


!•  There  are  on|y  two  reasons  for  collapsing  the  stack:  either  the  end  of 
the  sentence  has  been  reached,  In  which  case  the  stack  is  collapsed 
down  to  one  structure,  or  the  application  of  a word  in  the  sentence  to  a 
partial  parse  results  in  that  word  being  added  to  a stack  structure  which 
Is  not  at  the  top  of  the  stack.  In  the  latter  case,  the  stack  Is  collapsed 
down  to  the  structure  that  is  receiving  the  word, 

2.  Any  two  structures  resulting  from  the  collapse  of  a stack  are 
syntactically  equivalent.  This  means  that  either  both  or  neither  will 
resuit  In  a parse  of  the  sentence,  so  we  are  justified  In  using  semantics 
to  discard  all  but  one  of  the  structures  resulting  from  a collapse,  since 
syntactic  Information  will  not  enable  us  to  choose  between  them. 


Parsing 


32 


2.3  Reader1  s output 


2.3.1  Cases 


Given  a sentence  S,  Reader’s  output  consists  of  the  main  verb  of  S,  together  with 
Its  cases.  If  S Is  the  simple  sentence,  "Bill  hits  John",  then  Reader’s  output  would 
be  the  parse  below: 

{HIT  NN 

[SUB  BILL] 

[OBJ  JOHN] 

> 

The  open  bracket,  "{",  signals  the  beginning  of  a presentation  of  a verb  and  Its 
cases.  NN  is  a tense  marker  whose  meaning  will  be  explained  below.  The  SUB  case 

(cases  are  Introduced  by  square  brackets,  "[' ) of  "hit"  Is  "Bill"  and  the  OBJ  case  is 
"John". 

We  are  using  "case"  In  a different  sense  than  most  of  the  current  literature  does. 

In  the  literature,  "case"  Is  usually  used  to  refer  to  "deep  case",  a concept 

popularized  by  Fillmore  In  [Fillmore  68],  A good  definition  of  "deep  case"  can  be 

found  In  [Bruce  75];  "The  deep  cases  are  binary  relations  which  specify  an  event 

regardless  of  the  surface  realization  of  that  description  as  a sentence  or  noun 

Phrase".  To  see  exactly  what  this  means,  we  will  consider  a number  of  sentences 

Involving  the  verb  (event)  "hit".  For  this  example,  we  will  suppose  that  "hit"  has 

three  deep  cases:  the  entity  that  Is  receiving  the  effect  of  the  hit  (OBJECT),  the 

thing  the  object  Is  being  hit  wit  (INSTRUMENT),  and  the  entity  that  is  Instigating 

the  hitting  (AGENT).  Then  In 

1.  Bill  was  hit  by  the  hammer. 

P John  hit  Bill  with  the  hammer, 

3.  Bill  was  hit  with  the  hammer  by  John. 


Parsing 


33 


4.  The  hammer  hit  Blii. 

5.  John  hit  Bill. 

"Bill"  is  the  OBJECT  In  all  five  sentences,  "hammer"  Is  the  INSTRUMENT  In  the  first 

four  sentences,  and  "John"  Is  the  AGENT  In  sentences  2. ,3.  and  6.  Consider  the 

knowledge  needed  to  choose  the  cases  of  a "hit".  In  sentence  6.,  the  AGENT  is 

distinguished  from  the  OBJECT  by  their  relative  positions  about  the  verb.  The 

surface  structure  of  the  sentence,  then,  Is  one  source  of  Information  In  determining 

a verb's  cases.  It  is  obviously  not  the  only  source.  Sentence  4.  has  the  same 

surface  structure  as  sentence  5.,  yet  the  noun  preceding  the  verb  Is  considered 

the  INSTRUMENT,  rati  ’r  than  the  AGENT.  Furthermore,  If  we  say, 

"George  went  berserk.  He  battered  John  Into  unconsciousness, 
picked  him  up,  and  hurled  him  at  Bill.  John  hit  Bill.", 

then  John  Is  the  INSTRUMENT  of  "hit"  In  the  last  sentence.  Therefore,  determining 

cases  requires  the  surface  structure  of  the  sentence  as  well  as  Information  about 

the  objects  the  sentence  refers  to,  and  the  context  the  sentence  was  uttered  In. 

Reader  produces  a set  of  cases  which  are  derived  from  the  surface  structure  of 

the  sentence.  A deductive  system  can  then  use  Reader's  cases  in  combination 

with  the  Information  ,t  has  about  the  concepts  mentioned  In  the  sentence  to  derive 

Its  own  cases. 

The  three  primary  cases  used  by  Reader  are  SUB,  OBJ  and  IOB  (indirect  object).  In 
a passive  sentence,  one  In  which  the  verb  group  is  a verb  phrase  whose  last  two 
verbs  are  the  verb  "to  be"  and  the  main  verb  Inflected  with  an  "ed"  or  "en"  ending, 
the  OB-  precedes  the  verb  and  and  the  SUB  is  Introduced  by  "by".  If  the  sentence 
Is  rot  passive,  the  OBJ  follows  immediately  after  the  verb  and  the  SUB  precedes 
the  verb.  The  iOB  Is  a noun  that  can  modify  a verb,  without  needing  a preposition  to 
Introduce  It,  only  In  the  presence  of  both  the  SUB  and  OBJ. 


- fur 


Parsing 


34 


"John"  is  the  109  In  "Bill  gives  John  the  book."  since  we  can  not  say  "John  gives 
Bill."  to  mean  that  "Bill  received  something  from  John.",  but  can  say  "Bill  gives  the 
book."  to  Indicate  that  "A  book  was  given  to  someone  by  Bill.  Similarly,  John  Is  the 
IOB  In  ” rr  ?s  the  cat  John"  since  we  can’t  cay  "Bill  names  John."  to  mean  that 
"Bill  has  giver  name  JOHN  to  something.",  but  can  say  "Bill  names  the  cat."  to 
Indicate  that  Bill  given  some  name  to  the  crt.  Another  way  to  look  at  this  Is  that 
(without  resorting  to  prepositionoy  you  cannot  say  (using  the  verb  "give")  who  you 
are  giving  something  to  without  mentioning  what  you  are  giving,  and  similarly  you 
can’t  mention  what  you  are  naming  something  without  mentioning  the  thing  being 
named.  The  reversal  in  the  normal  order  of  IOB  and  OBJ  that  verbs  like  "name" 
exhibit  is  considered  a syntactic  property  of  the  verb.  Unless  a verb  Is  tagged  with 
this  property,  Reader  assumes  that  it  takes  Its  OBJ  and  IOB  in  the  normal  order. 

With  the  exception  of  "by"  and  "to",  Reader  does  not  try  to  assign  meaningful  case 
names  to  nouns  introduced  by  prepositions,  since  the  meaning  of  the  modification 
between  a verb  and  a prepositional  phrase  depends  on  both  the  verb  and  the 
object  of  the  preposition.  The  deductive  system  Is  expected  to  supply  a case 
name  when  It  judges  the  appropriateness  of  the  modification. 

in  passive  sentences,  "by"  frequently  Introduces  the  SUB.  When  Reader  parses 
such  a sentence  it  returns  the  object  of  "by"  as  the  SUB  of  the  verb  If  the 
deductive  .yst'rn  agrees  that  the  object  could  serve  as  the  SUB.  Given  the 
sentence  "Bill  was  shot  by  Jack",  Reader  would  ask  the  deductive  system  whether 
Jack  couid  shoot  Biil.  If  the  answer  were  "yes",  Jack  would  appear  as  the  SUB 
case  of  "shoot".  Change  the  sentence  to  "Bill  was  shot  by  the  door"  and  the 
deductive  system  would  answer  "No,  doors  cannot  shoot",  enabling  Reader  to  use 
"by  the  door"  to  specify  the  location  of  the  snooting. 


Parsing 


35 


"To"  Is  treated  similarly  to  "by"  by  Reader  In  that  Reader  assumes  that  "to"  always 
Ir  sauces  an  IOB  If  the  syntax  of  the  sentence  permits  this.  Therefore, 


"Bill  gives  John  the  book"  and  "Bill  gives  the  book  to  John"  parse  to 


{GIVE  NN 

[SUB  BILL] 
t IOB  JOHN] 

[OBJ  (BOOK  THE)] 


respectively. 


{GIVE  NN 

[SUB  BILL] 

[OBJ  (BOOK  THE)] 
[IOB  JOHN] 


The  parses  for  the  five  example  sentences  are; 

Bill  was  hit  by  the  hammer. 

{HIT  PN 

[OBJ  BILL] 

[SUB  (HAMMER  THE)] 


John  hit  Bill  with  the  hammer. 

{HIT  PN 

[SUB  JOHN] 

[OBJ  BILL] 

[PREP  (WITH  (HAMMER  THE))] 


Bill  was  hit  with  the  hammer  by  John. 

{HIT  PN 

[OBJ  BILL] 

[SUB  JOHN] 

[PREP  (WITH  (HAMMER  THE))] 


The  hammer  hit  Bill. 

{HIT  PN 

[SUB  (HAMMER  THE)] 
[OBJ  BILL] 


John  hit  Bill. 


{HIT  PN 

[SUB  JOHN] 
[OBJ  BILL] 


Parsing 


36 


We  can  see  that  SUB  corresponds  to  either  AGENT  or  INSTRUMENT,  and  that  OBJ 
corresponds  to  OBJECT  In  the  case  system  we  had  made  up  for  "hit". 

To  translate  Reader's  cases  Into  the  "hit"  case  system  one  would  only  have  to 
decide  which  SUBs  were  INSTRUMENTS  and  which  were  AGENTs,  equate  OBJECT 
with  OBJ,  be  aware  that  "with"  can  introduce  the  INSTRUMENT,  and  be  able  to 
distinguish  when  "with"  refers  to  an  Instrument  and  when  It  doesn’t.  A non-trlvial 
task,  since  we  could  say 

"He  hit  John  witii  Biii"  (accomplice) 

"He  hit  John  with  vim  and  vigor"  (method) 

"He  hit  John  with  malice"  (emotion) 


Section  5.2  explains  how  Reader’s  cases  are  mapped  Into  the  Interpreter’s  case 


system. 


Reader  actually  uses  more  cases  than  than  the  primary  ones  mentioned  above.  But 
the  other  cases  are  essentially  ad-hoc  ones  that  Reader  uses  to  store  modifiers  of 
the  verb.  Any  preposition  or  conjunction  (not  top-level)  defines  Its  own  case.  As 
an  example,  consider  "John  pushed  Janet  Into  the  closet  because  he  thought  BIII 
would  see  her.",  which  is  parsed  to: 


(PUSH  PH 

[SUB  JOHN] 

[OBJ  JANET] 

[PREP  (INTO  (CLOSET  THE))] 

[BECAUSE  (THINK  PN 

[SUB  HE] 

[WHAT  (SEE  (NN  WOULD) 
[SUB  BILL] 
[OBJ  HER] 


John  and  Janet  are  the  SUB  and  OBJ  of  push.  "Into  the  closet"  is  a preposition  case 


Parsing 


37 


of  "push",  filling  In  where  the  OBJ  was  pushed  to.  The  conjunction  "because"  fills  In 
the  presumed  reason  the  event  took  place,  and  Is  considered  a case  of  the  verb.  It 
contains  the  verb  clause  whose  main  verb  Is  "think".  "Ho"  Is  the  SUB  of  "think". 
"What  the  SUB  Is  thinking"  Is  stored  In  the  WHAT  case  of  "think"  The  contents  of 
the  WHAT  case  Is  the  verb  clause  whose  main  verb  Is  "see". 

2.3.2  Tense  markers 

Many  verb  clauses  contain  verb  groups  rather  than  just  single  verbs  A verb  group 
can  be  composed  of  adverbs,  models  and  other  verbs.  The  Information  contained  In 
a verb  group  that  a deductive  system  needs  Is  a list  of  adverbs  and  modals,  the 
root  of  the  main  verb,  and  the  tense  of  the  verb  group.  Reader  saves  the  modals 
and  adverbs  and  returns  them  In  appropriate  slots  In  the  parse  structure.  The  root 
of  the  main  verb  of  the  sentence  Is  similarly  returned.  This  means  that  Reader  must 
supply  the  tense  of  the  verb  as  a separate  piece  of  Information.  Reader  uses  six 
basic  tense  symbols.  These  are  shown  In  Figure  2.1,  together  with  an  example  of 
the  verb  group  each  represents. 


Parsing 


3t> 


VERB  GROUP 

TENSE 

I walk 

NN 

The  present  tense  of  the  verb  without  any 
aux  1 1 lary  verbs . 

I walked 

PN 

The  past  tense  of  the  verb  without  any 
aux 1 1 lary  verbs . 

1 will  walk 

FN 

The  auxiliary  "will"  followed  by  the 
uninflected  wain  verb. 

1 have  walked 

NP 

The  present  tense  of  the  auxiliary  verb  "hav°" 
followed  by  the  wain  verb  In  past  tense. 

1 had  walked 

PP 

The  past  tense  of  the  auxiliary  verb 
followed  by  the  wain  verb  in  past  tense. 

1 will  have  walked 

FP 

The  auxiliary  "will",  followed  by  the  auxi'iary 
"have"  followed  by  the  wain  verb  In  past  tense 

Figure  2.1 

Verb  tensee 

The  tense  markers  are  motivated  by  an  analysis  found  In  [Bruce  75].  Simplified,  It 
says  that  a tense  consists  of  a set  of  binary  relations  on  a set  of  reference  points. 
For  Instance,  the  tense  of  "had  walked"  consists  of  the  relations  on  the  three 
reference  points:  "the  time  of  the  speech"  (SI),  the  "time  of  the  subject"  (S2), 
and  the  "time  of  the  action"  (S3).  S2  is  in  the  Past  of  SI,  and  S3  is  in  the  Past  of 
S 2,  so  the  tense  of  the  verb  group  Is  Past-Past  or  PP.  Similarly,  the  tense  of  "have 
walked"  Is  Now-Past,  or  NP,  since  the  "time  of  the  subject"  Is  the  same  (Now)  as 
the  "time  of  the  speech"  and  the  "time  of  the  action"  Is  In  the  Past  of  the  "time  of 
the  subject".  To  see  how  this  works,  consider  the  sentences: 


1.  George,  the  club  president,  has  walked  through  these  halls.  (NP) 


2.  George,  the  club  president,  walked  through  these  halls. 


(PN) 


. ars'ng 


09 


In  1.,  the  "tine  of  the  action"  Is  in  the  past  of  the  "time  of  the  subject"  so  that  we 
may  not  assume  that  George  was  president  when  he  walked  in  these  hails,  but  we 
do  know  that  he  is  president  now,  since  the  time  of  the  subject  and  speech  are  the 
same.  In  2.,  the  time  of  the  action  and  subject  are  the  same,  so  we  know  that 
George  was  president  when  he  walked  through  these  hails,  but  is  not  necessarily 
president  now. 

We  get  six  more  tense  symbols  by  considering  verb  groups  whose  main  verb  ends  in 
"ing".  These  tenses  are  represented  by  appending  a "C"  (continuing  aspect)  to 
the  tenses  aoove: 


VERB  GROUP 

TENSE 

1 am  walking 

NNC 

1 was  walking 

PNC 

i will  be  walking 

FNC 

1 have  been  walking 

NPC 

i had  been  walking 

PPC 

1 will  have  been  walking 

FPC 

FigurB  2.2 

Tenaes  for  verbs  with  a continuing  aspect 

When  a verb  Is  used  as  an  infinitive,  eg.,  "to  hit"  In  "Bill  wants  to  hit  John",  the 
tense  marker  returned  is  "INF".  When  a verb  appears  with  an  "ing"  ending  and  no 
auxiliary  verbs,  as  in  "The  man  sitting  on  the  chair...",  the  tense  marker  returned  is 
"CC"  (an  arbitrary  symbol),  in  terms  of  tense  markers,  passive  constructions  are 
indistinguishable  (the  order  of  the  cases  determines  whether  a construction  is 
passive  or  not)  from  regular  constructions,  so  the  tense  of  "is  walked"  is  equivalent 


Parsing 


40 


to  the  tense  of  "walks",  namely  NN.  Verb  groups  consisting  of  the  auxiliary  verb 
"do"  and  an  uninflected  main  verb  (eg.,  "He  did  go...")  are  given  the  tense  of  the 
auxiliary  "c!o". 

We  have  ieft  out  tenses  which  require  the  verb  "to  go"  as  an  auxiliary  verb.  The 

reason  Is  that  verb  groups  using  "go"  as  an  auxiliary  are  ambiguous.  A verb  group 

like  "I  am  going  to  waik..."  might  mean  either  "In  the  future  some  time,  I will  walk"  or 

"I  am  actually  going  to  some  place  (the  beach,  for  example)  In  order  to  walk". 

Rather  tnan  try  to  resolve  this  anbigulty,  Reader  treats  the  Infinitive  as  a case  of 

the  verb  "go"  and  expects  the  deductive  system  to  be  aware  of  the  possible 

ambiguity  and  to  have  enough  information  to  resolve  It.  Therefore  "I  am  going  to 

waik"  Is  parsed  as 

(GO  NNC 
[SUB  1] 

[INF  (WALK  INF 

[SUB  imatchtoSUB) 

>] 

) 

The  infinitive  ciause  "to  waik"  is  treated  as  a case  of  the  verb  "go"  (INF).  The 
system  reading  the  parse  must  be  aware  that  It  can  be  interpreted  as  though  the 
main  verb  were  the  verb  of  the  INF  case  ("waik"),  with  a tense  derived  from  the 
verb  group  "am  going  to  walk".  The  SUB  of  "walk"  is  a dummy  noun  that  should  be 
matched  to  the  SUB  of  "go"  (i).  The  ambiguous  situation  is  easy  to  recognize,  it 
occurs  whenever  the  main  verb  of  ciause  is  "go",  and  the  clause  has  two  cases, 
SUB  and  INF. 

Some  temporal  information  is  contained  In  the  cases  of  the  verb  rather  than  the 
tense,  "i  went  yesterday"  parses  to 


Parsing 


41 


{GO  PN 

[sue  i] 

[WHEN  YESTERDAY] 

> 

so  that  the  exact  time  In  the  past  that  the  action  occured  In  Is  specified  by  the 
WHEN  case. 

The  verb  "have"  often  occurs  In  verb  groups  as  a modal.  "I  have  to  go  away" 

essentially  means  "I  must  go  away".  When  "have"  Is  used  as  a modal,  It  Is 

unambiguous.  Therefore,  when  "..have  to  verb..."  occurs  as  a verb  group,  Reader 

returns  verb  as  the  main  verb,  assigns  It  the  tense  of  the  verb  "have",  and  places 

the  marker  "HAVE-TO"  in  Its  adverb  slot.  "I  will  have  to  leave"  parses  to: 

{LEAVE  FN  {HAVE-TO) 

[SUB  I] 

> 

This  does  not  mean  that  every  time  the  phrase  "have  to  verb"  appears  In  a 
sentence  that  "have  to"  will  be  treated  as  a modal.  The  noun  phrase  "The  book  I 
have  to  give"  would  be  parsed  Into  a three  structure  stack: 

VERB:  ((GIVE)) 

3.  FUNCTION:  INF 

VERB:  ((HAVE)) 

NOUN  I : I 

2 . FUNCTION:  RC 

N0UN1 : (BOOK  THE) 

1.  FUNCTION:  MAIN 

The  stack  can  be  interpreted  In  two  different  ways:  "The  book  I must  give."  (3. 
attached  to  2 attached  to  1),  or  "The  book  I have  In  my  possession  which  I will 
give.",  (3.  and  2.  attached  to  1.  independently).  Only  the  first  Interpretation  treats 


have  to"  as  a modal. 


Parsing 


42 


The  tense  contains  all  the  Information  In  the  sentence,  yet  leaves  the  decision  of 
what  to  do  with  It  for  the  system  using  the  parser.  For  example,  If  the  tense  of  a 
statement  Is  NN  the  system  can  infer  that  a narrative  is  taking  place,  that  the 
action  described  in  the  statement  is  habitual,  etc. 

2.3.3  Noun  groups 

Reader  uses  a different  representation  for  noun  groups  than  most  parsers.  To 
Reader,  a noun  group  Is  a list  whose  first  element  is  the  head  noun  of  the  group,  and 
whose  remaining  elements  are  the  modifiers  of  the  head  noun.  The  difference  in 
representation  lies  in  tiie  fact  that  Reader  does  not  structure  the  modifiers  that 
preceded  the  noun  in  tiie  original  sentence. 

Therefore,  "The  messy  green  garbage  crn  cover"  is  parsed  as 

[NOUN  (COVLR  TilE  MESSY  GREEN  GARBAGE  CAN)] 

since  Reader  docs  not  try  to  determine  whether  this  means  either 

1.  the  cover  of  a can  used  for  messy  green  garbage. 

2 tiie  messy  cover  ot  a can  used  for  green  garbage. 

3.  the  messy  green  cover  of  a can  used  for  garbage. 

4.  tiie  messy  ccver  of  a green  can  used  for  garbage. 

5.  the  cover  of  a messy  green  can  used  for  garbage. 

6.  the  cover  of  a messy  can  used  for  green  garbage. 

Instead,  it  allows  the  deductive  system  to  structure  the  noun  group  when  the  stack 
entry  containing  the  noun  group  is  Formatted  (section  4.3).  This  is  necessary  to 
ovoid  needless  ambiguity.  The  sentence  "A  man  people  can  trust  Is  usually 
dangerous"  can  be  parsed  (correctly)  as: 


Parsing 


43 


(BE  NN  (USUALLY) 

[SUB  (KAN  A (TRUST  (NN  CAN) 

[SUB  PEOPLE] 

))] 

[OES  DANGEROUS] 

> 

But  unless  the  parser  can  discover  from  the  system  that  there  is  unlikely  to  be  "a 

man  people  can  trust"  (trust  modified  by  can,  people,  man  and  the)  It  will  also  find 

(BE  NN  (USUALLY) 

[SUB  (TRUST  A KAN  PEOPLE  CAN)] 

[OES  DANGEROUS] 

) 


since  "man",  "can",  and  "people"  are  nouns,  and  therefore  potential  modifiers  of 


"trust".  The  modifiers  that  followed  the  noun  in  the  original  sentence  are  structured 


by  Reader,  with  help  from  the  deductive  system.  This  Is  necessary  since  Reader 
must  know  whether  a sentence  constituent  coming  after  the  noun  modifies  It,  the 


verb  the  noun  modifies,  or  some  other  constituent  In  the  sentence.  "The  relation  In 


the  concept  that  is  marked  ‘possible’."  is  parsed  as: 


[NOUN  (RELATION  THE  (IN  (CONCEPT  THE)) 
(MARK  PN 

[OBJ  THAT] 

[I OB  “POSSIBLE"] 

))] 


In  a context  where  the  deductive  system  was  able  to  determine  that  relations  had 


markings  and  concepts  did  not,  and  as: 

[NOUN  (RELATION  THE  (IN  CONCEPT  THE  (KARK  PN 

[OBJ  THAT] 

[ IOB  "POSSIBLE"] 

)))] 

in  a context  where  the  deductive  system  thought  that  concepts  were  more  likely  to 


have  markings  than  relations.  The  "closer"  modification  Is  also  the  preferred  one  In 
the  absence  of  any  information  about  whether  concepts  or  relations  have  markings. 


Parsing 


44 


The  point  here  is  that  each  modifier  (at  top  level  In  the  noun  group  Mst)  coming  after 
the  noun  5 modifies  the  noun  independently. 

Reader’s  technique  of  not  structuring  noun  groups  as  they  are  encountered  allows 
It  to  parse  more  efficiently  than  a parser  that  gets  Involved  in  the  structure  of  noun 
groups  Immediately.  Suppose  we  are  given  a sentence  beginning  with  "The  messy 
green  garbage  can  cover...".  A parser  that  started  out  by  trying  to  parse  for  a 
structured  noun  grouo  would  Immediately  get  bogged  down  trying  to  determine  which 
of  the  six  or  more  possibilities  the  phrase  represented.  It  would  have  to  call  In  the 
deductive  system,  which  would  then  start  looking  for  instances  of  green  garbage, 
messy  cans,  etc.  By  delaying  the  structuring  until  later,  Reader  can  provide  the 
deductive  system  with  more  Information  (information  Including  the  main  verb  of  the 
clause,  Its  cases  and  the  case  of  the  unknown  noun  group)  to  guide  It u search  In 
determining  the  structure  of  the  noun  group.  And,  If  the  entire  sentence  happened 
to  be  "The  messy  green  garbage  can  cover  the  earth.",  no  time  will  ever  be  wasted 
structuring  the  noun  group. 

2.3,4  Choices 

Occasionally,  a sentence  contains  an  ambiguous  constituent  whose  ambiguity  can 
be  restricted  to  a small  segment  of  the  parse  structure.  When  this  happens, 
Reader  returns  one  parse  structure,  and  offers  a choice  between  the  ambiguous 
constituents.  This  leads  to  a more  efficient  parse,  and  enables  the  system  reading 
the  parse  to  compare  the  different  meanings  of  the  sentence  easily,  since  the 
choice  clearly  shows  where  the  parses  differ.  Here  are  two  examples  of  this  Idea: 

5 The  non  pretty-printed  version  of  the  parser  output  contains  a marker  between 
the  modifiers  which  come  before  and  after  the  noun. 


Parsing 


46 


"I  knew  that  Ice  was  slippery."  could  mean  either  "!  knew  that  that  Ice  was 
slippery"  or  "i  knew  Ice  was  slippery".  If  the  deductive  system  Is  unable  to 
determine  which  noun  group  It  prefers  at  the  time  It  Is  asked  to  structure  the  noun 
group,  Reader  wouid  return  the  following  parse,  offering  a choice  for  the  SUB  of 
"be". 


{KNOW  PN 

C SUB  I] 

[WHAT  {BE  PN 

[SUB  (^CHOICE  ICE 

(ICE  THAT) 

)] 

[OES  SLIPPERY] 

>] 


) 


"The  man  hitting  Janet  angered  Bill"  could  mean  either  "The  man  who  was  hitting 
Janet  angered  Bill"  or  "The  man's  hitting  of  Janet  angered  Bill".  Reader  represents 


this  as  follows: 


{ANGER  PN 

[SUB  ( ‘CHOICE  {HIT  CC 

[SUB  (MAN  THE)] 
fOBJ  JANET] 


)] 

[OBJ  BILL] 


} 

(MAN  THE  {HIT  CC 

[SUB  (match  to  head_noun] 
[OBJ  JANET]~ 


) 


The  first  choice  Is  the  action  "hit".  The  second  choice  Is  "man"  modified  by  "the" 
ano  a verb  clause  with  a dummy  SUB  (!match_to_head_noun)  that  should  be  matched 
to  the  noun  It  Is  modifying  ("man").  In  general,  a choice  can  be  offered  as  the 
contents  of  any  case. 


Another  method  Reader  uses  for  representing  ambiguous  sentences  Is  prefixing  the 


Parsing 


46 


name  of  a case  with  an  asterisk.  This  means  that  the  case  can  modify  either  the 

verb  or  the  noun  In  the  case  directly  above  It.  "Jonn  hits  the  salesman  with  the 

hammer"  is  parsed  to 

(HIT  NN 

C SUB  JOHN] 

[OBJ  (SALESMAN  THE)] 

[•PREP  (WITH  (HAMMER  THE))] 

> 

The  asterisk  preceding  the  case  name  "PREP"  Indicates  that  the  PREP  case  could 

be  a case  of  "hit"  or  that  It  could  modify  the  salesman.  The  first  Interpretation  Is 

"The  salesman  was  hit  by  John  with  the  hammer"  and  the  second  Is  "The  salesman 

with  the  hammer  was  hit  by  John".  Reader  uses  the  asterisk  notation  when  running 

without  a deductive  system,  or  when  running  with  a deductive  system  that  cannot 

decide  which  Interpretation  is  more  likely  at  the  time  Reader  asks.  The  parse  would 

have  been 

(HIT  NN 

[SUB  JOHN] 

[OBJ  (SALESMAN  THE  (WITH  (HAMMER  THE',')] 

> 

If  the  system  was  able  to  determine  the  salesman  had  the  hammer  when  given  the 
choice  by  Reader. 

2.3.6  Conventions 

Header  employs  several  notatlonai  conventions. 

Whenever  a conjunction  contains  an  Implied  SUB,  as  In  "The  program  reads  the  data 
and  prints  the  answer"  the  implicit  SUB  Is  represented  by  the  symbol 
"!match_to_conjunct_SUB".  eg., 


Parsing 


47 


[ CON J AND 
(READ  NN 

[SUB  (PROGRAM  THE)] 
[OBJ  (DATA  THE)] 


] 


{PRINT  NN 

[SUB  ! match  to  conjunctJUBJ 
[OBJ  (ANSWER  THE)] 

> 


!metch_to_conjunct_SUB  has  the  same  referent  as  "the  program". 


When  a noun  Is  modified  by  a relative  clause,  the  case  the  noun  occupies  In  the 
relative  clause  Is  held  by  the  symbol  !match_to_head_noun.  For  example, 


"The  man  captured  by  the  police. 


[NOUN  (MAN 


THE  (CAPTURE  PN 

[OBJ  I match  to  head_noun] 
[SUB  (POLICE  THE)] 


"The  man  the  police  captured." 

[NOUN  (MAN  THE  (CAPTURE 


PN 

[SUB  (POLICE  THE)] 

[OBJ  Imatch  to_head_noun] 


})] 


!rnatch_to_head_noun  has  the  same  referent  as  "the  man", 


the  noun  the  verb  clause 


Is  modifying. 


!match„to_head„uoun  Is  also  used  ’n  sentences  which  contain  dangling  prepositions. 

"The  man  I came  with"  parses  to; 

[NOUN  (MAN  THE  (COME  PN 

[SUB  I] 

[PREP  (WITH  Imatch  to_head_noun) J 

»] 

!match_to_head_noun  has  the  same  referent  as  the  noun  ("the  man")  modified  by 
the  clause  which  contains  the  dangling  preposition. 


Parsing 


48 


When  t conjunction  contains  an  implied  object,  Reader  uses  the  symbol 


!match_to_conjunct_OBJ6  to  mark  the  second  occurrence.  "He  breeds  and  raises 


rabbits"  parses  to: 


[ CON J AND 

{BREEDS  NN 

[SUB  HE] 

[OBJ  (RABBIT  ! PL >3 


{RAISES  NN 

[SUB  imatch_to_conjunct_SUB] 
[OBJ  imatch  to  conjunct  OBJ] 

> 

] 


In  conjunctions  in  which  the  verb  is  omitted,  Reader  simply  repeats7  the  verb. 


"He 


gave  John  a pencil  and  Janit  a pen"  parses  to: 


[CONJ  AND 

{GIVE  PN 

[SUB  HE] 

[I OB  JOHN] 

[OBJ  (PENCIL  A)] 

) 

{GIVE  PN 

[SUB  !matcH_to_conJunct_SUB] 
[ I OB  JANET] 

[OBJ  (PEN  A)] 


] 


Suffixes  are  removed  by  the  parser,  if  a word  is  a piurai,  the  symbol  !PL  appears  in 
Its  modifier  list.  "The  answers"  parses  to: 

[NOUN  (ANSWER  THE  1PL)] 

If  a word  can  be  either  singular  or  piurai,  and  agreement  constraii  s ..  to  be  one  or 


6 !match_to_conjunct_PREP  Is  used  when  the  OBJ  refers  to  the  object  of  a 
rreposltion  in  the  higher  conjunct. 

' Nouns  are  represented  by  symbols  (rather  than  bel  l epeated)  so  that  the 
interpreter  will  not  have  to  find  the  referent  of  the  same  noun  twice. 


Parsing 


49 


the  other,  It  is  noted  by  inserting  !PL  or  ISING  Into  the  modifier  list,  "The  fish  Is 


dangerc  js."  and  "The  fish  are  dangerous"  parse  to: 

(BE  NN  (BE  NN 

[SUB  (FISH  THE  ISING]  [SUB  (FISH  THE  IPL] 

[DES  DANGEROUS]  [DES  DANGEROUS] 

> > 

In  "The  fish  can  be  dangerous",  The  SUB  case  is  (SUB  (FISH  THE > ) since  there  Is 


no  agreement  Information, 


50 


3.  Grammar  writing 

This  chapter  explains  how  to  write  grammars  In  the  formalism  we  have  been 
discussing.  The  actual  grammar  Is  written  In  Lisp,  and  consists  of  a set  of  programs, 
one  for  each  word  class,  which  explain  when  and  how  a word  may  be  added  to  a 
partial  parse.  The  grammar  also  uses  several  utility  programs  and  predicates. 

An  example  of  a utility  program  is  ADD-NOUN.  It  takes  two  arguments,  a noun  group 

(ng)  and  a stack  structure  (*=),  and  returns  the  stack  structure  with  the  noun  group 

added  to  It.  For  example,  If 

ng  = (MAN  THE)  and  s Is  VERB:  ((SAVE  ED)) 

N0UN1:  (BOV  THE) 

FUNCTION:  MAIN 

then  (ADD-NOUN  ng  s)  is  VERB:  ((SAVE  ED)) 

NOUNI:  (BOV  THE) 

N0UN2:  (MAN  THE) 

FUNCTION:  MAIN 

An  example  of  a predicate  is  CAN-ACCEPT-A-NOUN.  It  takes  one  argument,  which  is 
a structure,  and  returns  T if  the  structure  can  accept  a noun,  and  NIL  otherwise.  A 
structure  can  accept  a noun  if  It  is  either 

1.  a preposition  structure  without  a noun. 

2.  a verb  structure  without  a noun 

3.  a verb  structure  with  a verb  and  one  noun  whose  verb  is  transitive. 

If  the  verb  group  Is  passive,  the  main  verb  must  take  a beneficiary  or 
Indirect  object. 

4.  a verb  structure  with  two  nouns  and  a main  verb  that  takes  a 
beneficiary  or  Indirect  object.  The  verb  group  must  not  be  passive. 


3.  and  4.  must  also  satisfy  the  condition  that  the  verb  has  not  received  any  cases 


Grammar  writing 


51 


since  It  was  added  to  the  structure1  On  the  surface,  It  would  seem  that  this 
definition  would  ruie  out  silghtly  peculiar  constructions  like  "That  he  likes",  (Instead 
of  "He  likes  that")  since  a verbiess  verb  structure  with  one  noun  cannot  accept 
another  noun.  However,  such  constructions  are  handled  as  relative  clauses. 

Reader  has  other  predicates  which  test  for  legal  verb  groups,  whether  a structure 
has  a noun  which  can  be  modified  by  another  structure,  whether  the  verb  group  of  a 
structure  Is  passive  or  active,  etc.  When,  In  describing  the  actions  of  the  parser, 
we  say  that  a structure  satisfies  some  condition,  we  mean  that  the  proper 
predicate  has  been  applied  to  that  structure  and  that  the  test  has  succeeded. 

Reader  also  has  two  programs,  SHIFT  and  SEARCH,  which  are  useful  for  manipulating 
the  stack.  SEARCH  Is  used  to  search  the  stack  for  structures  with  a certain 
property.  The  Information  gained  from  a search  Is  usually  used  to  determine 
whether  a particular  structure  should  be  pushed  on  to  the  stack.  For  instance,  It 
would  be  pointless  to  push  a relative  clause  structure  (section  3.1.3)  onto  the 
stack  If  there  were  no  stiuctures  In  the  stack  that  contained  a noun  which  could  be 
modified  by  a relative  clause.  SHIFT,  described  more  fully  In  section  3.1.2,  Is  used 
to  facilitate  the  addition  of  words  to  structures  other  than  the  one  at  the  top  of  the 
stack.  Basically,  SHIFT  searches  the  stack  for  a given  structure,  collapses  the 
stack  down  to  that  structure,  and  then  applies  the  Input  word  to  the  resulting  stack. 
SHIFT  Is  Important  because  most  actions  that  can  be  applied  to  the  top  of  stack, 
such  as  adding  In  a noun  or  verb,  can  also  be  applied  to  structures  lower  down  in 
the  stack.  Similarly,  SEARCH  Is  Important  because  pushing  a structure  onto  the 
stack  usually  depends  on  the  existence  of  a structure  with  a given  property, 
regardless  of  Its  position  in  the  stack. 

1 Eg.,  "Ha  spent  In  the  store  the  money."  is  Incorrect. 


Grammar  writing 


52 


3.1  Some  beginning  grammars 

A series  of  grammars  is  described,  each  one  more  complicated  than  the  previous 
one.  An  example  sentence  Is  parsed  for  each  grammar  defined.  The  first  two 
examples,  Grammar. 1 and  Grammar. 2,  will  step  through  the  sentence  In  detail, 
examining  how  each  successive  word  Is  applied  to  the  partial  parses  formed  by  the 
application  of  the  previous  words  In  the  sentence.  The  remainder  of  the  examples 
will  cover  only  the  methods  used  to  apply  words  that  were  not  handled  by  the 
previously  defined  grammars. 

Section  3.2  shows  some  more  efficient  methods  for  parsing  the  subset  of  English 
handled  by  the  example  grammars. 


The  variables  used  In  the  examples  are: 


slack 

The  stack. 

word 

The  current  input  word 

root 

The  root  of  word. 

ending 

The  endinc  of  word. 

ml 

The  unasskned  modifier  list. 

msg 

The  messa  » ;oncerning  the 

stack-msg 

The  messag  , concerning  the 

top  of  the  stack, 
entire  stack. 


3.1.1  Grammar.1 


The  first  grammar  handles  sentences  of  the  form  "noun  verb  noun  noun"  or  "noun 
verb  noun".  All  that  Is  needed  Is  a NOUN  program  and  a VERB  program. 

The  NOUN  program: 

The  NOUN  program  forms  the  noun  group  consisting  of  the  modifiers  on  the 
modifier  list  and  the  noun.  Then,  If  the  top  structure  In  the  stack  can  accept 
a noun  (eg.,  satisfies  the  predicate  CAN-ACCEPT-A-NOUN,  defined  at  the 
beglnlng  of  the  chapter),  a partial  parse  is  created  with: 

msg  = NOUN,  Indicating  that  the  last  addition  to  the  stack  was  a noun. 
ml  = NIL,  the  modifier  list  Is  empty. 

stack-msg  = stack-msg,  the  addition  of  a noun  doesn't  change  stack-msg. 


Grammar  writing 


53 


stack  = (REPLACE-TOP-STACK  (ADD-NOUN  (MAKE-NOUN-GROUP  word  ml) 

(TOP-STACK  stack)) 

stack) 

where  MAKE-NOUN-GROUP  is  a predicate  which  returns  the  noun  group  formed 
by  Its  arguments  (or  NIL  If  one  cannot  be  formed),  end  TOP-STACK  and 
REPLACE-TOP-STACK  are  utility  programs.  TOP-STACK  returns  the  top 
structure  of  the  stack  that  Is  Its  argument.  REPLACE-TOP-STACK  returns  the 
stack  which  Is  Its  second  argument  with  the  top  structure  replaced  by  Its  first 
argument. 

The  VERB  program: 

The  VERB  program  examines  the  stack.  If  the  top  structure  In  the  stack  Is  a 
verb  structure  with  one  noun  and  no  verb,  It  creates  a partial  parse  by  adding 
the  verb  to  the  top  structure  In  the  stack. 


Here  Is  how  this  grammar  parses  the  sentence  "John  drinks  water." 

Reader  starts  out  with  the  Initial  paths!  parse. 

msg  = BEGIN,  ml  * NIL 

FUNCTION:  MAIN 

"John"  Is  Input,  it  belongs  to  only  one  word  class  (NOUN),  and  therefore  has  only 
one  program  associated  with  It  (NOUN).  The  partial  parse  produced  by  applying  the 
NOUN  program  Is: 

msq  = NOUN,  ml  = NIL 

N0UN1 : JOHN 
FUNCTION:  MAIN 

"drinks"  Is  the  next  word,  it  can  be  used  as  either  a noun  or  verb.  The  top  stack 
structure  cannot  accept  a noun  so  the  application  of  the  noun  program  does  not 
result  In  a continuation  of  the  parse.  The  verb  program  Is  then  applied  to  the  parse 
which  causes  the  following  partial  parse  to  be  set  up: 
msg  = VERB,  ml  = NIL 

VERB:  ((DRINK  . S)) 

NOUNI.  JOHN 
FUNCTION:  MAIN 


Grammar  writing 


54 


"Water"  can  also  be  used  es  a noun  or  verb.  The  verb  program  falls  though,  since 
the  top  structure  already  has  a verb.  The  NOUN  program  succeeds  In  continuing  tho 
parse  by  adding  the  noun  "water"  to  the  top  structure  In  the  stack,  producing, 


msg  = NOUN,  ml  = NIL 

VERB:  ((DRINK  . S)) 
NGUN1 : JOHN 
N0UN2:  WATER 
FUNCTION:  MAIN 


The  Input  sentence  Is  exhausted  so  Reader  collapses  the  stack,  (trivial  since  there 


Is  only  one  structure  In  It),  and  formats  the  resulting  structure.  This  yields 


(DRINK  NN 

[ SUB  JOHN] 
[OBJ  WATER] 
I 

as  the  parse. 


3.1.2  Grammar. 2 


In  order  to  parse  more  Interesting  sentences,  It  Is  necessary  to  expand  the 
grammar.  The  next  grammar  Includes  prepositions,  articles  and  modifiers. 

The  MODIFIER  program  simply  adds  word  to  ml. 

The  ARTICLE  program  adds  word  (which  Is  an  article)  to  ml  If  ml  Is  NIL  or 
consists  of  words  (almost,  all,  etc.)  which  can  appear  before  an  article. 

The  PREPOSITION  program  checks  to  see  whether  the  preposition  can  be 
modified  by  the  modifiers  on  ml.  If  so,  the  partial  parse  Is  continued  by 
pushing  a preposition  structure  with  word  as  the  preposition  onto  the 
stack. 


As  the  grammar  grows,  the  grammar  programs  has  to  be  prepared  to  handle  stacks 
containing  more  than  one  structure.  In  general,  there  will  be  two  parts  to  every 
grammar  program:  a set  of  actions  associated  with  Just  the  top  of  the  stack  and  a 


Grammar  writing 


65 


set  of  actions  that  should  be  applied  to  every  structure  In  the  stack  that  satisfies 
certain  conditions.  For  example,  In  parsing  "He  gave  the  man  In  the  store  the  book." 
a noun  (the  book)  must  be  added  to  a structure  (He  gave  the  man)  which  Is  not  at 
the  top  of  the  stack.  Adding  words  to  structures  below  the  top  of  the  stack  Is 
facilitated  by  the  program  SHIFT. 

(SHIFT  stack  program  a rgs  purpose  number  predicate 1 predlcateZ) 

The  Idea  behind  SHIFT  Is  to  find  a structured)  In  the  stack  which  satisfies  a given 
predicate,  (CAN-ACCEPT-A-NOUN,  for  example,  would  be  used  to  search  down  the 
stack  for  a structure  to  add  a noun  to),  then  collapse  the  stack  down  to  that 
structure,  and  then  apply  a program  to  the  collapsed  stack.  SHIFT  enables  the 
grammar  writer  to  specify  the  purpose  of  the  collapse,  whlc.i  Is  valuable  In  guiding 
the  way  the  collapse  is  carried  out.  For  instance,  If  SHIFT  Is  collapsing  the  stack  of 
the  sentence  "He  gave  the  man  In  the  store  ...",  for  the  purpose  of  finding  a 
structure  which  cr.n  accept  a noun,  it  knows  not  to  try  to  attach  "In  the  store"  to 
"gave",  since  that  would  prevent  "gave"  from  accepting  another  noun. 

SHIFT  works  as  follows:  It  searches  down  stack  looking  for  a structure  S that 
satisfies  predicatel.  stack  is  then  divided  into  two  segments,  SI  starting  from  the 
top  of  stack  and  going  down  to  S,  and  S2  consisting  of  the  structures  not  In  SI.  SI 
Is  then  collapsed  Into  a single  structure  SS.  If  SS  satisfies  predlcateZ,  then 
program  Is  applied  tc  (STACK-PUSH  SS  SI)  with  arguments  equal  to  args.  number 
controls  how  many  times  the  sequence  is  performed.  If  number  Is  an  Integer  n, 
SHIFT  tries  to  find  the  first  n structures  that  satisfy  predicated  number  = T means 
that  shift  finds  all  the  stack  structures  satisfying  predicated  purpose  Is  an  atom 
(eg.,  NOUN  means  the  collapse  Is  looking  for  a structure  which  can  accept  a noun) 


which  controls  how  structures  can  be  attached  to  one  another. 


Grammar  writing 


66 


Grammar. 2 Involves  adding  a SHIFT  to  both  the  NOUN  and  VERB  programs.  The  SHIFT 
In  noun  searches  for  all  structures  In  the  stack  which  can  accept  a noun,  and  then 
adds  the  word  to  that  structure.  The  SHIFT  In  verb  looks  down  the  stack  for  the 
topmost  verb  structure  In  the  stack  which  can  accept  a verb. 

Grammar. 2 can  handle  sentences  like  "The  woman  from  the  city  bank  gave  the  man 
In  the  store  the  news".  The  parse  starts  out  with  the  Initial  parse.  After  "The"  Is 
Input,  there  Is  one  partial  parse, 
msg  = BEGIN,  ml  = (THE) 

FUNCTION:  WIN 

"woman"  Is  read.  MAKE-NOUN-GROUP  forms  the  noun  group,  (WOMAN  THE). 

msg  = NOUN,  ml  = NIL 

N0UN1 : (WOMAN  THE) 

FUNCTION:  MAIN 

from  Is  read.  The  preposition  program  causes  a preposition  structure  to  be 

pushed  on  the  stack. 

msg  = PREP,  ml  = NIL 

PREP:  FROM 

N0UN1 : (WOMAN  THE) 

FUNCTION:  MAIN 

"the"  Is  read  and  placed  on  the  modifier  list,  "city"  is  read.  All  nouns  are  treated 
as  both  NOUNs  and  MODIFIERS,  so  there  are  now  two  partial  parses: 

1.  msg  - NOUN,  ml  = NIL  2.  msg  = PREP,  ml  = (CITY  THE) 

PREP:  FROM  PREP;  FROM 

NOUN  (CITY  THE)  

N0UN1:  (WOMAN  THE) 

NOUNI : (WOMAN  THE)  FUNCTION  MAIN 

FUNCTION.  MAIN 


Grammar  writing 


67 


"bank"  Is  read.  When  "bank"  Is  applied  as  a verb,  partial  parse  2 can  not  be 
continued  since  "bank"  (as  a verb)  does  not  accept  the  modifiers,  (CITY  THE),  on 
the  modifier  list.  Partial  parse  1 cannot  be  continued  using  "bank"  as  a verb  since 
after  SHIFT  finds  a structure  that  can  accept  a verb,  the  verb  "bank"  falls  to  agree 
with  the  noun  group  (WOMAN  THE).  The  agreement  Is  tested  using  a predicate 
which  takes  a verb  structure  as  Input,  and  returns  NIL  If  the  structure  does  not 
exhibit  agreement,  and  the  structure  modified  by  any  Information  supplied  by 
agreement  (eg.,  "He  saw"  agrees  only  when  "saw"  Is  viewed  as  the  past  tense  of 
"see",  as  opposed  to  the  present  tense  of  "saw".)  when  the  structure  does  agree. 
Reader  then  applies  "bank"  to  both  partial  parses  as  a noun.  Partial  parse  1 does 
not  contain  a structure  that  can  accept  a noun,  so  no  partial  parses  can  be 
continued  from  It  When  "bank"  is  applied  to  the  partial  parse  2.,  It  accepts  the 
modifiers  on  the  modifier  iist  and  Is  added  to  the  top  preposition  structure, 
producing 

msg  = NOUN,  ml  •-  NIL 

PREP  FROM 

NOUN  (BANK  THE  CITY) 

N0UN1  (WOMAN  THE) 

FUNCTION:  MAIN 

"gave"  Is  read.  The  SHIFT  program  searches  down  the  stack  looking  for  the  first 
structure  that  can  accept  a verb.  It  collapses  the  stack  down  to  that  structure  and 
adds  in  the  verb,  which  produces, 
msg  * VERB,  ML  = NIL 

VERB:  ((GIVE  EO) ) 

N0UN1 : (WOMAN  THE  (FROM  (BANK  THE  CITY))) 

FUNCTION.  MAIN 

"the"  and  "man"  are  read  in  and  handled  by  the  MODIFIER  and  NOUN  programs, 
"man"  Is  applied  as  both  a noun  and  a modifier  so  two  partial  parses  result: 


Grammar  writing 


58 


1.  msq  = NOUN,  ML  * NIL  2.  MSG  * NOUN,  ML  * (MAN  THE) 

VERB : ((GIVE  EO))  VERB:  ((GIVE  ED)) 

NOUNI  (WOMAN  THE  (FROM  (BANK  THE  CITY)))  NOUNI : (WOMAN  THE  (FROM  (BANK  THE  CITY))) 
NOUN2  ■ (MAN  THE)  FUNCTION:  MAIN 

FUNCTION:  MAIN  

"in"  Is  read.  The  preposition  program  causes  a preposition  structure  to  be  pushed 
on  the  stack  of  partial  parse  1.  Nothing  Is  dono  with  partial  parse  2.  since  the 
preposition  does  not  accept  the  modifiers,  (MAN  THE),  on  the  modifier  list. 

msq  = PREP.  ML  * NIL 
PREP:  IN 

VERB:  ((GIVE  EO)) 

N0UN1 . (WOMAN  THE  (FROM  (BANK  THE  CITY))) 

N0UN2:  (MAN  THE) 

FUNCTION:  MAIN 

"the"  and  "store"  are  read.  As  before,  two  parses  are  created  when  "store"  is 
read  In.  One  In  which  the  noun  group  "the  store"  becomes  the  noun  of  the 
preposition  structure  on  the  top  of  the  stack,  and  another  In  which  "store"  Is 
treated  as  a modifier.  When  "store"  Is  tried  as  a verb  it  falls  since  It  cannot 
accept  "the"  as  a modifier,  "the"  Is  read  In.  in  the  former  partial  parse,  it  Is  simply 
added  to  the  modifier  list.  In  the  latter,  It  cannot  be  added  to  the  modifier  list, 
since  the  modifier  list  contains  a word  (store)  which  cannot  occur  before  an  article. 

msq  = NOUN.  ML  = (THE) 

PREP  IN 

NOUN  (STORE  THE) 

VER8  ((GIVE  EO)) 

NOUNI-  (WOMAN  THE  (FROM  (BANK  THE  CITY))) 

N0UN2 : (MAN  THE) 

FUNCTION:  MAIN 

"news"  Is  read.  When  It  Is  applied  as  a noun,  SHIFT  searches  for  a structure  on  the 
stack  that  can  accept  a noun,  collapses  the  stack  to  that  structure,  and  then  adds 


Grammar  writing 


59 


In  the  noun  group  (NEWS  THE).  When  "news"  Is  tried  as  a modifier  It  Is  simply 


added  to  the  modifier  list. 


1 msg  = NOUN,  ml  * NIL 
VERB  ((GIVE  EO)) 

NOUNI'  (WOMAN  THE  (FROM  (BANK  THE  CITY))) 
N0UN2:  (MAN  THE  (IN  (STORE  THE))) 

N0UN3 : (NEWS  THE) 

FUNCTION:  MAIN 


2.  msq  ■=  NOUN,  ml  • (NEWS  THE) 

PREP:  IN 

NOUN:  (STORE  THE) 

VERB:  ((GIVE  ED)) 

N0UN1:  (WOMAN  THE  (FROM  (BANK  THE  CITY))) 
N0UN2 : (MAN  THE) 

FUNCTION:  MAIN 


There  are  no  more  input  words.  Partial  parse  2 is  discarded  since  its  modifier  list  is 
not  empty.  The  stack  from  partial  parse  1.  Is  collapsed,  (once  again,  this  Is  trivial 
since  there  Is  only  one  structure  In  the  stack.)  and  the  resulting  structure  is 


formatted  and  returned  as  the  parse  of  the  sentence. 


(GIVE  PN 

[SUB  (WOMAN  THE  (FROM  (BANK  THE  CITY)))) 
[ I OB  (MAN  THE  (IN  (STORE  THE)))] 

[OBJ  (“EWS  THE)) 

) 


3.1.3  Grammar.3 

Grammar. 3 expands  Grammar. 2 by  the  Inclusion  of  verb  groups  and  relative  clauses. 

To  parse  relative  clauses,  a test  Is  added  tc  NOUN  that  checks  to  see  if  there  is  a 
structure  In  the  stack  which  has  a noun  that  can  be  modified,  using  the  predicate 
CAN-NOUN-BE-MODIFIEO.  If  the  test  succeeds,  NOUN  pushes  a verb  structure  with 
function  equal  RC  on  the  stack  and  adds  the  noun  group  to  it.  This  addition  enables 
the  grammar  to  parse  "The  mirror  on  the  wall  he  broke".  The  parse  proceedes 
exactly  as  the  previous  ones  until  "he"  Is  reached.  The  partial  parse2  when  "he"  is 

2 There  are  actually  two  partial  parses.  The  second  uses  "wall"  as  a modifier  and 
Is  discontinued  since  MAKE- NOUN-GROUP  fails  to  make  a noun  group  from  "he"  and 
(WAIL  THE). 


Grammar  writing 


60 


read  Is 

msq  : NOUN,  ml  = NIL 

PP.EP,  ON 
NOUN:  (WALL  THE) 

NOUNI  (MIRROR  THE) 

FUNCTION:  MAIN 

CAN-NOUN-BE-MCDIFIED  succeeds  on  the  preposition  structure  on  the  top  of  the 
stack.  Therefore  a parse  Is  created  with  a verb  structure  pushed  on  to  the 
previous  stack.  Only  one  parse  results  from  applying  NOUN  to  the  parse  since  when 
SHIFT  Is  called,  It  cannot  find  a sttucture  that  can  accept  a noun, 
msg  = NOUN,  ml  * NIL 

NOUNI : K 
FUNCTION:  RC 

PREP:  ON 
NOUN:  (WALL  THE) 

NOUNI : (MIRROR  THE) 

FUNCTION:  MAIN 

"broke"  Is  read.  SHIFT  Is  called  to  find  a verb  structure  with  an  open  verb  slot.  It 
finds  the  top  structure  In  the  stack,  and  creates  a parse  with  the  verb  added  In. 
msg  --  NOUN,  ml  --  NIL 

VERB:  ((BREAK  ED)) 

NOUNI , HE 
FUNCTION  RC 

PREP'  ON 
NOUN:  (WALL  THE) 

NOUNI:  (MIRROR  THE) 

FUNCTION:  MAIN 

The  sentence  Is  over,  and  the  parse  Is  concluded  by  the  collapse  of  the  stack.  The 
deductive  system  must  decide  which  of  "the  wall"  or  "the  mirror"  was  broken.  If 


we  assume  that  "the  mirror"  was  broken,  the  collapse  of  the  stack  would  be, 


Grammar  writing 


61 


NOUNI:  (MIRROR  THE  (ON  (WALL  THE))  (BREAK  PN  ( SUB  HE) 

(OBJ  lmstch_to_head_noun))) 


FUNCTION:  MAIN 


The  format  of  such  a structure  Is  simply  the  noun.  Reader  returns 

[NOUN  (MIRROR  THE  (ON  (WALL  THE)) 

(BREAK  PN 

[SUB  HE) 

[OBJ  Imatch  to  head  noun) 

»)  - - - 

as  the  parse.  "Mirror"  Is  the  OBJ  of  the  verb  "break".  Notice  that  the  noun  CAN- 
NOUN-BE-MODIFIED  succeeded  on  was  not  the  noun  that  was  modified  by  the 
relative  clause. 


Parsing  verb  groups  requires  the  addition  of  a test  to  VERB  which  tests  that  msg 
equals  VERB.  If  the  test  succeeds,  meaning  that  the  last  thing  done  to  the  stack 
was  the  addition  of  a verb,  VERB  tries  tc  form  a verb  group  with  word  and  the  verbs 
already  In  the  top  structure  In  the  stack.  If  a legal  verb  group  can  be  formed,  (this 
Is  checked  by  the  same  predicate  which  tenses  the  verbs  In  a structure)  the  parse 
Is  continued  by  adding  the  verb  Into  the  verb  group  slot  of  the  top  structure  In  the 
stack.  As  an  example,  consider  "He  was  given  the  prize".  When  "given"  Is  read, 
there  Is  one  partial  parse: 

msg  = VERB,  ml  = NIL 

VERB;  ((BE  ED)) 

NOUNI : HE 
FUNCTION:  MAIN 

The  msg  Is  VERB  and  "was  given"  Is  a legal  verb  group  so  the  parse  Is  continued  as: 
msg  = verb,  ml  = NIL 

VERB:  ((GIVE  EN ) ) ( BE  ED)) 

NOUNJ : HE 
FUNCTION:  MAIN 


The"  and  "prize"  are  read  In.  The  stack  Is  collapsed  and  formatted.  The  result  Is 


Grammar  writing 


62 


{GIVE  PN 

[ 108  HE] 

[OBJ  (PRIZE  THE)] 

> 

3.1.4  Grammar.4 

Grammar. 4 extends  Grammar. 3 In  two  ways. 

The  first  addition  Is  a test  for  time  and  place  referents  that  will  be  placed  In  the 
NOUN  program.  This  will  enable  the  grammar  to  handle  sentences  like  "I  saw  the 
man  .i  wntown.",  "Yesterday  John  was  Ir,  town."  etc. 

NOUN  Is  augmented  with  a test  which  checks  w1  ‘her  the  noun-group  can  be  used 
as  a time  or  place  (this  Is  considered  a syntactic  property  of  the  head  noun  of  the 
group).  If  so,  a preposition  structure  is  created  with  preposition  equal  *TIME  or 
*PLACE.  The  preposition  structure  Is  pushed  onto  the  stack  and  a new  partial  parse 
created. 

The  second  addition  allows  the  parser  to  parse  sentences  with  verbs  that  accept 
other  verbs  as  case  fillers.  An  example  of  a verb  with  this  property  Is  "see".  In  "I 
saw  John  leave  town",  the  clause  "John  leave  town",  Is  a case  of  "saw".  A test  Is 
added  to  VERB  which  checks  whether  the  main  veru  of  a structure  can  accept  a 
clause.  !f  so,  an  empty  verb  structure  with  function  equal  WHAT  Is  pushed  onto  the 
stack  and  a new  partial  parse  created. 

Grammar.4  handles  sentences  like  "Yesterday  the  man  knew  John  had  returned." 


Yesterday"  causes  the  formation  of  two  partial  parses,  one  In  which  It  Is  treated 


Grammar  writing 


63 


as  a time  referent,  and  one  in  which  It  Is  used  as  the  first  noun  of  the  MAIN 


structure. 


1.  msg  = NOUN,  ml  = NIL 

NOUN  1 : YESTERDAY 
FUNCTION:  MAIN 


2.  msg  = NOUN,  ML  * Nil 

PREP:  *TIME 
NOUN:  YESTERDAY 

FUNCTION:  MAIN 


When  "nan"  is  Input,  It  cannot  be  added  to  partial  parse  1,  since  there  is  no 
structure  In  the  stack  that  can  accept  a noun,  "man"  can  be  added  to  partial  parse 
2,  by  collapsing  the  stack  dov/n  to  the  MAIN  structure  and  adding  "man"  to  the  MAIN 
structure.  This  results  in 

msg  = NOUN,  ml  = NIL 

N0UN1 : (MAN  1 He ) 

CASES:  ((WHEN  YESTERDAY)) 

FUNCTION:  MAIN 


as  the  COLLAPSE  routine  km  ws  that  preposition  structures  whose  preposition  Is 
*TIME  fill  the  WHEN  case  of  the  verbs  they  modify. 


"Know"  can  accept  a clause,  so  the  application  of  "knew"  to  the  partial  parse 


above  results  in  two  different  partial  pa  js: 


1 . msg  = VERB,  ml  = NIL 

VERB  ((KNOW  ED)) 

N0UN1 : (MAN  THE) 

CASES:  ((WHEN  YESTERDAY)) 
FUNCTION.  MAIN 


2.  msg  = NIL,  ML  = NIL 

FUNCTION:  WHAT 

VERB:  ((KNOW  ED)) 

N0UN1 : (MAN  THE) 

CASES:  ((WHEN  YESTERDAY)) 
FUNCTION:  MAIN 


John"  Is  added  to  both  partial  parses: 


Grammar  writing 


64 


1.  msg  = NOUN,  ml  = Nil.  2.  msg  = NOUN.  ML  = NIL 

N0UN1 : JOHN 
FUNCTION:  WHAT 

VERB:  ((KNOW  EO)) 

NOUNI : (MAN  THE) 

CASES:  ((WHEN  YESTERDAY)) 
FUNCTION:  MAIN 


"had"  is  applied  to  each  partial  parse  as  verb.  Partial  parse  2 is  continued  by 
adding  "had"  to  the  top  structure  of  the  stack.  Partial  parse  1 cannot  be 


VERB:  ((KNOW  ED)) 

N0UH1:  (MAN  THE) 

N0UN2:  JOHN 

CASES  ((WHEN  YESTERDAY)) 
FUNCTION  MAIN 


continued. 


The  addition  of  "returned"  to  the  stack  produced  by  the  application  of  "had" 
produces, 


msg  = VERB.  ML  = NIL 

VERB  ( (RETURN  E0)(HAS  EO) ) 
N0UN1 : JACK 
FUNCTION:  WHAT 

VERB  ((KNOW  ED)) 

N0UN1 : (MAN  THE) 

CASES:  ((WHEN  YESTERDAY)) 
FUNCTION:  MAIN 


The  Input  sentence  Is  exhausted.  The  stack  Is  collapsed  and  the  resulting 


structure  formatted. 


(KNOW  PN 

[WHEN  YESTERDAY] 

[SUB  (MAN  THE)] 

[WHAT  (RETURN  PP 

[SUB  JACK] 


> 


}] 


Grammar  writing 


65 


3.2  Grammar  efficiency 

The  primary  objective  In  writing  an  efficient  grammar  Is  keeping  the  number  of 

partial  parses  low.  This  Is  accomplished  by  mlnimi?ing  the  number  of  ways  a word 

can  be  successfully  applied  to  a partial  parse.  There  are  basically  three  different 

ways  of  handling  this  within  the  Reader  formalism. 

R1.  The  use  of  the  stack  to  avoid  attaching  sentence  constituents  to 
each  other  until  more  Information  Is  learned  about  the  nature  of  the 
attachment. 

R2.  The  use  of  one  stack  structure  to  represent  more  than  one  syntactic 
possibility. 

R3.  The  use  of  bottom-up  and  top-down  parsing  techniques  together. 

The  simplest  example  of  the  first  technique  is  the  handling  of  sentence 
constituents  which  can  modify  many  different  structures  In  the  sentence  (eg., 
prepositional  phrases,  relative  clauses,  etc.).  Such  constituents  are  placed  on  the 
stack,  thereby  avoiding  the  necessity  of  a different  parse  path  for  each  sentence 
structure  that  can  accept  them  as  a modifier.  Woods,  in  [Woods  73],  mentions  a 
similar  feature,  called  "selective  modifier  placement".  However,  it  seems  limited  to 
the  simple  application  mentioned  above.  More  powerful  uses  of  the  stack  are 
obtained  in  conjunction  with  R2. 

R2  makes  use  of  the  fact  that  in  many  cases,  two  or  more  syntactic  possibilities 
can  be  combined  In  a single  parse  structure.  For  example,  consider  a sentence 
beginning  "The  boy  that..."  Obviousiy,  "that"  Is  part  of  a relative  clause  which  will 
modify  "boy".  But  it  Is  not  clear  whether  "that"  Is  either 

1.  the  subject  of  the  relative  clause  ("The  boy  that  likes  Ice  cream...") 

2.  a modifier  of  the  subject  of  the  relative  clause  ("The  boy  that  girl 
likes...") 


Grammar  writing 


66 


3.  a function  word  ("The  boy  that  the  girl  likes..."). 

A single  stack  entry  which  covers  all  these  possibilities  Is 

S = N0UN1 : THAT 
FUNCTION:  RC 

If  a verb  Is  applied  to  the  stack  containing  S before  a noun  Is  applied,  S will  lead  to 
a successful  parse.  Now  suppose  a noun  is  applied  before  a verb.  If  a noun  group 
can  be  made  from  "that",  the  modifiers  on  the  modifier  list,  and  the  noun  being 
added,  then  the  sentence  Involves  usage  2,  and  "that"  Is  replaced  by  the  noun 
group3.  If  a noun  group  cannot  be  constructed  using  "that",  but  can  be  made  using 
just  the  modifier  list  and  the  noun,  then  "that"  Is  replaced  by  the  noun  group  (usage 
3). 

R2  can  be  used  with  R1  in  a slightly  different  way.  Consider  the  two  sentences: 

1.  "He  saw  the  man  running  out  the  door." 

2.  "He  saw  the  man  running  out  the  door  drop  the  hag." 

In  sentence  1.,  "running  out  the  door"  is  most  likely  interpreted  as  "whai  he  saw 
the  man  doing".  In  sentence  2.,  "running  out  the  door"  Is  a relative  clause  which 
modifies  "man".  One  structure, 

S = VERB  ((RUN  I NG> ) 

FUNCTION:  PARTICIPLE 

can  represent  both  Interpretations.  It  Is  decided  which  Interpretation  to  use 
depending  on  the  conditions  under  which  the  stack  Is  collapsed.  The  relative  clause 


3 If  a noun  group  could  also  be  made  without  using  "that",  a message  Is  left  which 
Indicates  to  Format  that  a choice  between  "that  noun-group"  and  noun-group 
should  be  offered. 


Grammar  writing 


67 


Interpretation  Is  used  If  the  stack  Is  being  collapsed  to  add  a verb,  and  the  "see" 
case  filler  Interpretation  Is  used  otherwise.  A more  detailed  example  can  be  found 
In  section  3.2.5. 

Section  3.2.3  provides  an  example  of  R3.  Tho  following  two  sections  contain 
examples  of  R2. 

3.2.1  Nouns  as  modifiers 

Virtually  all  English  nouns  can  also  be  used  as  modifiers.  In  "The  baseball  bat  Is 
used  to  hit  the  baseball",  the  first  occurrence  of  "baseball"  Is  used  as  a modifier, 
while  the  second  Is  used  as  a noun.  The  grammars  In  section  3.1.1  coped  with  this 
by  applying  each  noun  to  every  possible  partial  parse  as  both  a noun  and  a modifier. 
The  example  sentence  would  have  two  partial  parses  after  "baseball"  was  read. 

1.  msq  = NOUN,  ml  = NIL  2.  msg  = BEGIN,  ml  = {BASEBALL  THE) 

N0UN1  : (BASEBALL  THE)  FUNCTION:  MAIN 

FUNCTION  MAIN  - - - 

It  Is  true  that  one  of  the  two  parses  will  always  be  killed  rather  quickly,  but  It  would 
be  better  to  avoid  the  overhead  involved  In  carrying  extra  partial  parses.  As  a noun 
cannot  modify  a verb,  there  is  no  advantage  to  be  gained  from  putting  one  on  the 
modifier  list.  When  a noun  acts  as  a modifier,  It  modifies  one  of  the  nouns  that  come 
directly  after  It  In  the  sentence.  The  second  parse  can  be  eliminated  by  adding  a 
test  to  the  NOUN  program  that  checks  for: 

1.  msg  = NOUN  (meaning  the  last  thing  done  to  the  stack  was 
the  addition  of  a noun  group  to  the  top  structure) 

2.  the  noun  group  consisting  of  word  and  the  words  in  the 
last  noun  group  added  to  the  top  s’ructure  In  the  stack  Is  a 
legal  noun  group. 


Grammar  writing 


68 


If  the  test  succeeds,  the  last  noun  group  added  to  the  top  structure  In  the  stack  Is 
replaced  by  the  noun  group  consisting  of  word  with  the  words  In  the  replaced  noun 
group  as  Its  modifiers.  Under  this  scheme,  there  would  be  oniy  one  partial  parse  for 
a sentence  beginning  "The  baseball..."  (parse  1,  shown  above),  if  the  next  word  In 
the  sentence  were  "bat",  Its  application  to  parse  1 would  result  In 

msg  = NOUN,  ml  = NIL 

NOUNI.  (BA  THE  BASEBALL) 

FUNCTION.  MAIN 

since  parse  1 meets  the  requirement  of  msg  = NOUN  and  "the  baseball  bat"  Is  a 
legalfl  noun  group. 

3.2,2  Relative  clauses 

Grammar. 3 (section  3 1.3)  parses  relative  clauses  in  essentially  a top  down  fashion. 

When  a noun  is  read,  and  the  stack  contains  a structure  with  a noun  which  could  be 

\ 

modified  by  a relative  clause,  a verb  structure  with  function  equal  RC  Is  created, 
the  noun  Is  added  to  it,  and  the  resulting  structure  Is  pushed  onto  the  stack  to 
await  the  verb  of  the  relative  clause.  If  a sentence  began  "The  city  people..." 
after  "people"  was  read  there  would  be  two  partial  parses: 

I.  msg  = NOUN,  ml  = NIL  2.  msg  = NOUN,  ml  = NIL 

N0UN1 : (PEOPLE  THE  CITY)  NOUNI:  PEOPLE 

FUNCTION  MAIN  FUNCTION:  RC 

N0UN1 : (CITY  THE) 

FUNCTION:  MAIN 

If  the  complete  sentence  were  "The  city  people  hate  Is  Tokyo."  the  second  partial 


4 The  test  would  fall  if  the  sentence  were  "The  baseballs  bat  ..."  since  "the 
baseballs  bat"  Is  not  a legal  noun  group". 


Grammar  writing 


69 


parse  would  lead  to  a parse,  "hate"  would  be  the  verb  of  the  "RC"  verb  structure 
and  "Is"  would  be  the  verb  of  the  "MAIN"  structure.  Parse  1 wou  J use  "htue"  as 
the  verb  of  the  "MAIN"  structure  and  the  parse  would  be  discontinued  after  "is"  Is 
road,  since  the  stack  would  not  contain  a verb  structure  which  could  accept  "Is",  if 
the  complete  sentence  was  "The  city  people  favor  bonds.",  partial  parse  1 would 
lead  to  a parse.  Parse  2 would  be  discontinued  when  the  end  of  the  sentence  is 
reached  and  the  parser  realizes  that  It  cannot  attach  "people  favor  bonds"  to  "the 
city".  If  the  main  verb  of  a sentence  which  begins  with  a such  a compound  noun 
takes  an  Indirect  object,  then  the  sentence  is  syntactically  ambiguous,  (eg.,  "The 
city  people  gave  the  bonds")  The  parser  must  not  refuse  to  add  "bonds"  to  "people 
favor"  (which  would  kill  the  parse  earlier)  since  the  sentence  might  have  been  "The 
city  people  favor  bonds  for  Is  Tokyo." 

This  splitting  can  be  avoided  by  making  changes  in  the  NOUN  and  VERB  program.  In 
the  previous  section,  a test  was  added  to  NOUN  which  determined  when  it  was 
possible  to  replace  the  last  noun  group  added  to  a structure  with  the  noun  group 
consisting  of  word  and  the  words  In  the  old  noun  group.  If  that  test  succeeds,  and 
word  Is  a legal  noun  group  by  Itself,  then  instead  of  parsing  for  a possible  relative 
clause  In  a new  partial  parse  (by  pushing  a verb  structure  whose  function  is  RC 
onto  the  stack),  a message  Is  inserted  In  the  message  slot  of  the  top  structure 
explaining  that  It  Is  possible  to  form  a relative  clause  with  the  head  noun  of  the  last 
noun  group  In  the  structure.  In  VERB,  the  method  used  to  flno  an  empty  vero  slot  Is 
modified  so  that  If  no  structure  can  be  found  with  an  empty  verb  slot,  VERB  tries  to 
find  a structure  whose  message  is  "Possible  RC". 

These  changes  allow  "The  city  people  hate  Is  Tokyo."  to  be  parsed  using  only  one 
parse  path.  After  "hate"  Is  read,  there  Is  one  partial  parse: 


Grammar  writing 


70 


msq  = VERB,  ml  = NIL 

VERB  ((HATE)) 

N0UN1  (PEOPLE  THE  CITY) 

MESSAGE.  P0SS1BLE-RC 
FUNCTION:  MAIN 

VERB  tries  to  find  an  open  verb  slot  to  put  "Is"  In.  It  can' t find  one,  but  It  is  able  to 
find  a stack  structure  whose  message  Is  POSSIBLE-RC.  It  removes  the  message, 
verb  and  head  noun  from  the  structure,  forms  a new  verb  structure,  and  places  It  In 
the  stack  just  above  the  old  one.  This  forms  a new  stack, 

VERB:  ((HATE)) 

NOUN  1 : PEOPLE 
FUNCTION:  RC 

NOUNI : (CITY  THE) 

FUNCTION:  MAIN 

which  Is  has  a place  for  the  verb  "Is". 

3.2.3  Verbs  which  accept  clauses 

Grammar. 4 (section  3.1.4)  showed  one  way  of  handling  verbs  which  can  accept 
clauses  as  case  fillers.  Like  the  first  relative  clause  mechanism,  It  was  essentially 
top  down.  When  a verb  that  was  able  to  accept  a clause  was  added  to  a structure, 
a second  partial  parse  was  created  with  an  empty  verb  structure  whose  function 
was  WHAT  pusl.ed  onto  the  stack.  A better  method  Is  to  wait  for  the  verb  of  the 
clause  to  arrive  before  sprouting  another  partial  parse.  "I  saw  the  man  In  the  store 
steal  the  book."  would  then  have  one  partial  parse  at  the  time  "steal"  was  read: 


Grammar  writing 


71 


msg  = NOUN,  ml  = NIL 
PREP:  IN 

2.  NOUN.  (STORE  THE) 

VERB.  ((SEE-SAW)) 

NOUN  1 . I 

NOUN2:  (MAN  THE) 

I.  FUNCTION:  MAIN 

"See-saw"  Is  the  verb  used  by  Reader  to  represent  either  the  past  tense  of  "see" 
or  the  present  tense  of  "saw",  it  has  all  the  syntactic  properties  of  both.  If 
something  in  the  parse  resolves  which  verb  is  Intended,  Reader  makes  the  change. 
When  "steal"  is  read,  VERB  looks  down  the  stack  for  a structure  that  can  accept  a 
verb,  it  finds  structure  1.,  which  has  a verb,  "see-saw",  that  can  accept  a clause. 
The  stack  Is  collapsed  down  to  structure  1.,  yielding 


VERB:  ((SEE-SAW)) 

N0UN1  I 

N0UN2 . (MAN  THE  (IN  (STORE  THE))) 
I . FUNCTION:  MAIN 


A verb  structure  with  function  equal  WHAT  is  created  to  hold  "steal".  NOUN2  is 


removed  from  structure  1.,  and  placed  in  the  new  structure,  which  is  pushed  onto 


the  top  of  the  stack.  The  verb  "see-saw"  has  been  changed  to  "see"  by  the 


program  which  pushed  the  WHAT  structure  onto  the  stack,  since  "saw"  cannot 


accept  a clause.  The  result  is: 


VERB  ((STEAL)) 

N0UN1:  (MAN  THE  (IN  (STORE  THE))) 
2.  FUNCTION:  WHAT 

VERB:  ((SEE  ED)) 

N0UN1 : I 

1.  FUNCTION:  MAIN 


Grammar  writing 


72 


3.2.4  Conjunctions 

Conjunctions  are  similar  to  other  sentence  constituents  In  that,  syntactically,  they 

usually  can  be  attached  to  more  than  one  sentence  constituent.  For  example, 

"The  man  In  the  suit  and  tie."  (suit  and  tie  form  the  conjunction.) 

"The  man  !n  the  suit  and  John."  ( man  and  John  form  the  conjunction.) 

"Bill  bought  the  turntable  John  was  selling  because  he  needed  the  money." 
("because  he  needed  the  money"  specifies  why  "John  was  selling".) 

"Bill  bought  the  turntable  John  was  selling  because  he  liked  the  way  It  sounded." 
("because  he  liked  the  way  it  sounded"  specifies  why  "Bill  bought".) 

Ambiguities  arising  from  which  constituent  the  conjunction  should  be  attached  to  are 
handled  by  the  stack  and  COLLAPSE.  "The  man  In  the  suit  and  John"  would  be 
parsed  Into  the  stack, 

PREPOSITION:  ANO 
3.  NOUN:  JOHN 

PREPOSITION:  IN 
Z.  NOUN:  (SUIT  THE) 

NOUNI:  (MAN  THE) 

I . FUNCTION:  MAIN 

"And"  (when  acting  as  a conjunction  between  nouns)  Is  treated  as  a preposition 
syntactically.  When  the  stack  is  collapsed,  It  Is  determined  whether  3.  should  be 
attached  to  1 . or  2. 

Conjunctions  between  verbs  are  handled  by  pushing  a verb  structure  whose 
function  Is  the  conjunction  onto  the  stack.  "Bill  bought  the  turntable  John  was 
selling  because  he  needed  the  money."  would  be  parsed  Into: 


Grammar  writing 


73 


VERB:  ((NEED  ED)) 

NOUNI : HE 

NOUN2:  (MONEY  THE) 

3.  FUNCTION:  BECAUSE 

VERB.  ((SELL  1 NG ) ( BE  ED)) 
NOUNI ■ JOHN 
2.  FUNCTION-  RC 

VERB:  ((BUY  ED)) 

NOUNI . BILL 

NOUN2 : (TURNTABLE  THE) 

1.  FUNCTION:  MAIN 


When  the  stack  is  Collapsed,  It  is  determined  (by  the  Interpreter,  acting  through 
Format)  whether  3.  modifies  2.  or  1. 


At  first  glance,  It  would  appear  that  the  application  of  a conjunction  that  can  conjoin 
nouns  and  verbs  (or  a conjunction  that  is  also  a preposition,  eg.,  before  like)  to  a 
Po  se  will  result  In  two  partial  parses:  one  in  which  a verb  clause  is  expected  (a 
verb  structure  Is  pushed  on  the  stack),  and  one  In  which  Just  a noun  Is  anticipated 
(a  preposition  structure  is  pushed  on  the  stack).  However,  both  expectations  can 
be  handled  by  pushing  on  a verb  structure5  whose  message  Is  POSSIBLE-PREP  and 
modifying  Format  so  that  it  formats  a verb  structure  whose  message  Is  POSSiBLE- 
PREP  and  whose  verb  slot  Is  empty  as  if  it  were  a preposition  structure  whose 
preposition  is  function  and  whose  noun  slot  is  the  value  of  the  nouni  slot  of  the 
verb  structure.  Aiso,  VERB  has  to  be  modified  to  search  for  empty  verb  structures 
down  the  stack  past  those  verb  structures  whose  message  Is  POSSiBLE-PREP. 


Using  this  method,  the  stack  for  "John  iikes  Janet  and  Bill  ..."  would  be 


5 Assuming  the  stack  can  accept  a verb  conjunction.  The  stack  for  the  sentence 
beginning  "John  and  ..."  can  only  accept  "and"  as  a noun  conjunction.  The  general 
condition  is  that  a stack  cannot  accept  a verb  conjunction  if  the  top  most  verb 
structure  whose  message  is  not  POSSiBLE-PREP  does  not  contain  a verb.  If  the 
stack  cannot  accept  a verb  conjunction  then  the  parse  is  continued  by  pushing  a 
preposition  structure  on  the  stack. 


Grammar  writing 


74 


NOUN  1 • BILL 

MESSAGE:  POSSIBLE-PREP 
2.  FUNCTION:  AND 

VERB:  ((LIKE  S') 

NOUNI : JOHN 
NOUN2 ■ JANET 
1.  FUNCTION:  MAIN 


If  the  sentence  continued  "John  likes  Janet  and  Bill  hates  Jill",  "hates"  would  be 
placed  In  the  verb  slot  of  structure  2.  If  the  sentence  was  simply  "John  likes  Janet 
and  BUI",  the  stack  would  be  collapsed  and  the  format  of  structure  2.  would  be 
(AND  BILL) 

the  same  os  the  format  of  the  preposition  structure, 

PREPOSITION  AND 
NOUN.  BILL 

Finally,  If  the  sentence  were  "John  likes  Janet  and  Bill  and  George  hate  Jill.",  "hate" 
would  be  applied  to  the  following  stack: 


N0UN1 : GEORGE 
MESSAGE  P0SSI3LE-PREP 
3.  FUNCTION  AND 

NOUN  I BILL 

MESSAGE.  POSSIBLE-PREP 
2 FUNCTION  AND 

VERB  ((LIKE  S)) 

NO’JNl  JOHN 
N0UN2 . JANET 
1 . FUNCTION  MAIN 


VERB  would  first  try  to  add  ’hate"  to  structure  3.  This  would  fall  since  "hate"  and 
"George"  do  not  agree.  It  would  then  try  to  add  "hate"  to  structure  2.,  after  having 
attached  3.  This  would  succeed  since  "hate"  and  (BILL  (AND  GEORGE))  do  agree. 
Note  that  If  "hate"  could  have  been  added  to  structure  3.  (If  the  sentence  were 
"John  likes  Janet  and  Bill  and  the  children  hate  Jill.",  for  Instance)  then  VERB  would 


Grammar  writing 


76 


still  have  tried  to  attach  "hate"  to  a structure  lower  down  In  the  stack  so  that  til 


the  possible  meanings  of  the  sentence  could  be  uncovered.  "John  likes  Janet  and 


Bill  and  the  children  hate  Jill."  could  mean  either 


[ CON J AND 

or 

(LIKE  NN 

[SUB  JOHN] 

[OBJ  JANET] 


(HATE  NN 

[SUB  (AND  BILL 

(CHILD  IPL) 

)] 

[OBJ  JILL] 


[CONJ  AND 


(LIKE  NN 

[SUB  JOHN] 

[OBJ  (AND  JANET 
BILL 

)] 

> 

(KATE  NN 

[SUB  (CHILD  IPL)] 
rOBJ  JILL] 


> 

] 

In  producing  the  two  parses  above,  Reader  did  not  have  to  split  Into  two  parses 


until  the  word  "hate"  was  encountered. 


3.2.6  Verbs  Inflected  with  ed  endings 

Verbs  Inflected  with  an  "ed"  ending  which  are  not  preceded  by  auxiliary  verbs  can 
usually  be  applied  to  a parse  (as  verbs)  In  two  different  ways:  as  the  main  verb  of 
a clause,  "The  police  captured  the  robber.",  or  as  a modifier  following  a noun.  "The 
robber  captured  by  the  police  was  convicted".  The  grammar  Reader  uses  combines 
the  two  possibilities  Into  one. 

When  an  "ed"  verb  Is  encountered,  any  combination  of 

1.  There  Is  a verb  structure  In  the  stack  that  has  an  empty 
verb  slot. 

2.  There  Is  a structure  In  the  stack  that  has  a noun  which 
could  be  modified  by  a relative  clause. 

can  be  true.  Suppose  an  "ed"  verb  Is  encountered. 


Grammar  writing 


76 


If  th';  last  operation  on  the  stack  was  the  addition  of  a verb  (msg  = 
VERB),  and  the  "ed"  verb  forms  a legal  verb  group  with  the  verb  Just 
added,  It  Is  added  Into  the  top  structure  In  the  stack  as  part  of  the  verb 
group.  VERB  exits. 

If  1.  and  2.  are  true,  then  verb  structure  Is  pushed  on  to  the  stack 
with  FUNCTION  equal  REL  R-MAIN  VERB  equal  the  "ed"  verb,  and 
NOUNI  equal  !match_to_head_mun.  If  the  verb  clause  Is  used  as  the 
predicate  of  the  sentence,  then  !match_to_head_noun  will  be  repla'ed 
by  the  N0UN1  of  the  structure  It  Is  added  to. 

If  just  2.  is  true,  then  a verb  structure  Is  pushed  on  the  stack  with 
FUNCTION  equal  REL. 

If  just  1.  Is  true,  the  r.csck  is  collapsed  down  to  the  stricture  with  tho 
empty  verb  slot,  and  the  verb  Is  added. 

If  neither  1.  or  2.  Is  true,  then  VERB  simply  exits.  The  parse  will  be 
continued  by  using  the  "ed"  verb  as  a modifier. 


Th^se  methods  parse  "The  man  in  the  phot  graph  framed  tor  he  police  was  his 
father",  as  follows.  The  stack,  before  "framed"  Is  read  and  after  "police"  Is  read, 
Is  shown  below. 


PREP:  FOR 

4.  NOUN  (POLICE  THE) 

VERB:  ((FRAME  ED)) 
N0UN1:  'match- to-head- 
3.  FUNCTION:  REL-OR-MAIN 

PREP:  IN 

NOUN:  (PHOTOGRAPH  THE) 

PREP:  IN 

2.  NOUN:  (PHOTOGRAPH  THE) 

NOUNI  : (MAN  THE) 
FUNCTION:  MAIN 

NOUN! : (MAN  THE) 

I . FUNCTION:  MAIN 

man  in  the  photograph . . 

. f ramed  for  the  police... " 

A verb  structure  with  FUNCTION  ?qual  REL-OF,  MAIN  has  been  pushed  on,  since  the 
stack  contains  both  a structure  with  an  empty  verb  slot  (1)  and  one  (both  1.  and  2.) 
with  a noun  which  could  be  modified  by  a relative  clause.  If  the  sentence  ended 


Grammar  writing 


77 


with  "police",  the  stack  would  be  collapsed,  and  the  deductive  system  would  be 
asked  to  choose  from  among  the  three  possible  parses  the  stack  could  be  collapsed 


to: 


"The  man  In  the  photograph  which  was  framed  for  the  police." 

(NOUN  (MAN  THE  (IN  (PHOTOGRAPH  THE  (FRAME  PN 

[OBJ  Imatch  to_head  noun] 

[FOR  (FOR  (POLICE  THE))] 

>)))) 

"The  man  In  the  photograph  who  was  framed  for  the  police." 

(NOUN  (MAN  THE  (IN  (PHOTOGRAPH  THE))  (FRAME  PN 

[OBJ  Imatch  to_head_noun] 
[FOR  (POLICE  THE)] 

>)) 

"The  man  in  the  photograph  did  frame  (photos  or  people)  for  the  police. 


(FRAME  PN 

[SUB  (THE  MAN  (IN  (PHOTOGRAPH  THE)))] 
[FOR  (FOR  (POLICE  THE))] 

> 


The  sentence  continues  with  "was",  however.  The  VERB  program  applies  "was"  to 
the  stack  by  searching  down  the  stack  for  a structure  with  an  empty  verb  slot.  It 
finds  1.,  and  collapses  the  stack  with  the  purpose  of  Inserting  a verb.  This  means 
that  3.  cannot  be  attached  to  1.  as  the  main  verb  of  the  sentence,  since  that  slot 
Is  now  reserved  for  "was".  The  deductive  system  decides  whether  the  man  or 
photograph  was  framea  (we  will  assume  "the  man"),  and  "was"  Is  Inserted  In  the 
resulting  structure.  This  yields 

VERB  ((BE  ED)) 

N0UN1 : (MAN  THE  (IN  (PHOTOGRAPH  THE)) 

(V  FRAME  PN  (OBJ  (match  to  head_noun) 

(FOR  (FOR  (POLICE  THE))))) 

FUNCTION:  MAIN 


and  the  parse  is  continued.  In  the  rse  of  the  complete  sentence,  the  companion 


Grammar  writing 


78 


system  never  had  to  consider  a meaning  which  used  "the  man"  as  the  SUB  of 


frame". 


79 


4.  A closer  look 


This  chapter  explains  some  of  the  algorithms  mentioned  earlier  In  greeter  detail. 


4.1  Measure 

Each  stack  structure  has  a slot  set  aside  for  Its  measure,  which  Is  used  by  Reader 
to  help  It  choose  among  competlno  partial  parses.  The  measure  of  a structure  rates 
both  the  syntax  and  semantics  of  the  structure.  The  deductive  system  (via  Format) 
Is  responsible  for  determining  the  semantic  component  of  a structure’s  measure. 
Section  6.6  explains  how  semantic  measure  is  calculated  in  the  Reader-Interpreter 
system. 

Two  measures  are  compared  by  first  comparing  the  two  semantic  components.  If 
one  measure  has  a better  semantic  rating  (section  4.1.1)  than  the  other,  It  is 
preferred.  If  the  semantic  components  are  equal,  the  measure  with  the  best  syntax 
rating  (section  4.1.2)  Is  preferred.  If  both  components  are  equal,  the  measures  are 
equal.  This  comparison  system  prefers  a very  unusual  (but  legal)  syntactic 
structure  to  a more  common  syntactic  structure  If  the  former  is  judged  to  be  even 
slightly  better  semantically. 

A structure  Is  measured  when  It  Is  Formatted.  Format  reti  rns  the  format  of  the 
structure  as  well  as  Its  measure,  which  is  then  merged1  with  the  contents  of  the 
measure  slot  of  the  structure  receiving  the  formatted  structure.  The  measure  of  a 

1 The  merge  of  two  measures,  Ml  and  M2,  Is  the  measure  whose  semantic  and 
syntactic  components  are  the  union  of  the  semantic  ano  syntactic  components  of 
Mi  and  M2. 


A closer  look 


80 


structure,  therefore,  c ntalns  the  measure  of  all  the  struct  ires  that  have  been 
attached  to  It. 

4.1.1  The  semantic  component 

The  semantic  component  consists  of  three  features.  The  Interpreter  Is  responsible 

for  rating  each  feature.  A rating  can  have  one  of  3 values: 

perfect:  The  Interpreter  is  perfectly  satisfied  with  this  feature. 

acceptable:  The  interpreter  would  prefer  something  else  but  the 
feature  Is  acceptable. 

unacceptable:  The  feature  Is  unacceptable. 

A semantic  component  A is  better  than  a semantic  component  B If 

1.  A has  fewer  unacceptable  features  than  B. 
or 

2.  A and  B have  the  same  number  of  unacceptable  features,  and  A 
has  fewer  features  which  are  merely  acceptable. 

This  algorithm  would  prefer  a semantic  component  with  only  acceptable  features  to 
a component  with  one  unacceptable  feature  and  a large  number  of  perfect  features. 
An  alternative  method  Is  to  allow  some  number  of  perfect  features  to  cancel  the 
effects  of  an  unacceptable  feature. 

The  following  features  contribute  to  the  semantic  component. 

Verb  Cases 

Is  the  verb  well  modified?  The  ratings  are: 

perfect:  The  verb  has  all  the  cases  it  needs  to  be  well  defined. 

acceptable:  The  verb  Is  missing  some  cases  which  are  usually  found 
with  It. 


A closer  look 


81 


unacceptable:  The  verb  Is  missing  some  cases  which  are  necessary, 

"Put"  Is  an  example  of  a verb  requiring  a case;  namely  a where-put  case.  One 
almost  never  says  "John  put  the  ball".  Therefore  a verb  structure  whose  main  verb 
was  "put"  that  did  not  have  a where-put  case  would  be  rated  unacceptable.  This 
does  not  prohibit  the  parser  from  parsing  a sentence  like  "John  put  the  ball".  If 
that  were  the  sentence  the  parser  was  given,  then  the  best  structure  the  parser 
would  be  able  to  find  would  be  one  whose  measure  contained  a semantic  component 
with  at  least  one  unacceptable  rating. 

An  acceptable,  but  not  perfect,  case  of  verb  modification  can  occur  with  verbs  like 
"go".  "Go"  prefers  a case  explaining  where  the  SUB  has  gone.  However  it  Is  fairly 
common  to  omit  that  case  if  it  Is  Implicit  from  some  other  Information. 

Noun  Modifications 

This  is  an  evaluation  of  the  appropriateness  of  each  noun  group  In  the  structure. 
The  ratings  assigned  are, 

perfect:  The  noun  group  is  perfect.  The  deductive  system  can  find  an 
object  in  Its  representation  of  what  has  been  said  which  the  noun  group 
refers  to. 

acceptable:  A referent  cannot  be  found,  but  al!  the  modifications  in  the 
noun  group  are  meaningful  to  the  deductive  system,  eg.,  The  deductive 
system  will  know  how  to  Interpret  the  noun  group, 

unacceptable:  The  deductive  system  cannot  understand  the  proposed 
modifications. 

Sometimes  the  rating  given  a noun  group  will  depend  on  the  context  the  sentence 
containing  the  noun  group  occurs  In.  Consider  the  noun  group  "The  student 
George",  if  there  were  two  George's  and  one  of  them  was  known  to  be  a student, 


A closer  look 


82 


one  might  want  to  disambiguate  which  George  was  being  referred  to  by  using  the 
phrase,  "the  student  George";  as  In  "The  student  George  Is  always  busy". 
However  we  would  not  want  the  parser  to  consider  the  phrase  "the  student  George 
saw"  as  having  a meaning  other  than  "the  student  that  George  saw",  except  In 
such  a context. 

This  feature  is  also  responsible  for  measuring  the  fit  of  the  modifiers  coming  after 
the  noun.  "The  ball  In  the  box"  would  be  rated  perfect  If  the  Interpreter  could  find 
a ball  In  the  box,  acceptable  if  not.  "The  store  he  kissed"  would  be  rated  perfect  If 
the  Interpreter  could  locate  a store  that  was  kissed,  unacceptable  If  not. 

Appropriateness  of  Verb  Cases 

Most  verbs  prefer  certain  types  to  fill  their  cases  The  Interpreter  should  have  a 

verb  frame  for  each  verb  [Reader  can  operate  without  this  frame;  it  just  means  that 

one  more  level  of  discrimination  is  lost,  which  might  result  In  Reader  finding  more 

interpretations  of  a sentence  than  a person  would]  which  It  uses  to  evaluate  how 

well  the  verb’s  cases  fit  It.  The  values  are, 

perfect:  The  verb  and  cose  satisfy  the  interpreter's  expectations. 

acceptable:  The  verb  doss  not  usually  contain  the  case,  but  the 
Interpreter  Is  aware  of  Idioms  that  would  cause  the  verb  to  receive  It. 

unacceptable:  The  interpreter  is  unable  to  find  any  role  for  the  case  to 
play  In  the  verb’s  definition. 

The  verb  "give"  prefers  a human  as  its  SUB,  a non-human  as  Its  OBJ  and  a human  as 
its  IOB  (recipient).  Using  these  expectations  enables  a person  to  find  only  one 
meaning  for  "He  gave  the  ball  Bill  gave  the  salesmen",  namely 


A closer  look 


83 


{GIVE  PN 

[SUB  HE] 

[OBJ  (BALL  THE  {GIVE  PN 

[SUB  BILL] 

[IOB  {SALESMAN  THE)] 


and  not  consider, 


{GIVE  PN 

[SUB  HE] 

[IOB  (BALL  THE  (GIVE  PN 

[SUB  BILL] 

»J 

[OBJ  (SALESMAN  THE)] 


since  the  second  Interpretation  assigns  "give"  a non-human  fo.  Its  IOB  case  and  a 
human  for  Its  OBJ. 


A parser  cannot  afford  to  reject  possible  parses  that  contain  verbs  that  don’t 
accept  their  cases  since  one  frequently  uses  verbs  In  ways  which  violate  their 
case  preferences,  as  In  "He  gave  the  bride  -way",  "The  noise  gives  him  a 
headache"  or  "He  gave  the  wall  a kick". 


d.1.2  The  Syntactic  Component 

Reader  tries  to  filter  out  some  of  the  partial  parses  that  are  valid  syntactically, 
semantically  meaningful,  and  yet  wouid  not  be  selected  by  a person,  if  a structure 
has  this  property.  It  Is  marked  In  the  syntactic  component  of  its  measure.  The 
syntactic  component  with  the  fewest  such  markings  Is  the  best.  A structure 
Inherits  the  measure  of  any  structure  that  is  attached  to  it,  so  it  Is  possible  for  the 
syntactic  component  of  the  measure  of  a structure  to  have  more  than  one  syntactic 
mark  against  It.  Here  Is  an  example  of  this  Idea: 


A closer  look 


84 


"The  salesman  crushed  by  the  elevator  was  hurt"  Is  understood  by  realizing  that 
the  verb  phrase,  "the  salesman  crushed  by  the  elevator"  Is  the  subject  of  was. 
Using  the  same  methods  Reader  finds  two  meanings  to  "i  saw  the  salesman 
crushed". 


The  oniy  meaning  most  people  would  consider  is.  Ml:  "/  saw  the  act  of  salesman 


being  crushed ", 


(SEE  PN 
[SUB  I] 

[WHAT  (CRUSH  PN 

[OP  J (SALESMAN  THE)] 

>3 

> 

Reader  finds  another  Interpretation,  which  is  M2:  "/  saw  the  salesman  who  was 


crushed" 


(SEE  PN 
[SUB  I] 

[OBJ  (SALESMAN  THF  (CRUSH  PN 

[ObJ  !match_to  Head  noun] 

>)] 

> 

People  who  want  to  convey  the  second  meaning  say  the  sentence  differently,  so 
we  do  not  want  the  parser  to  return  with  two  parses  for  "i  saw  the  salesman 
crushed"  since  people  do  not  find  it  ambiguous.  The  second  meaning  has  to  be 
considered,  since  the  parser  may  be  given  "I  saw  the  salesman  crushed  by  the 
elevator  waik  away  unhurt".  Raader  marks  the  syntactic  component  of  any  verb 
structure  whose  verb  can  accept  a clause  and  whose  GiTJ  Is  a noun  modified  by  a 
verb  clause  with  lmatch_to„head_rioun  for  a dummy  OBJ.  Thus,  if  Reader  were 
given  the  example  sentence  "i  saw  the  salesman  crushed",  Ml  would  have  a better 
measure  than  M2,  so  Reader  would  return  cniy  one  parse  for  the  sentence. 


A closer  look 


85 


It  should  be  noted  that  the  rules  used  in  determining  the  measure  of  a structure  are 
distinct  from  the  rules  used  In  the  grammar.  The  rule  used  In  the  above  example 
("...mark  any  verb  structure  whose  verb  can  accept  a clause,  and  whose  OBJ  is  a 
noun  modified  by  a verb  clause  with  !match_to_head_noun  for  an  OBJ")  may  seem 
somewhat  ad-hoc.  But  this  rule  In  no  way  effects  the  structuring  of  an  Input 
sentence.  It  Is  merely  used  to  filter  structures  that  the  parser  finds.  Without  this 
rule,  the  system  working  with  the  parser  would  have  to  decide  for  Itself  whether  "I 
saw  the  salesman  crushed"  meant  Ml  or  M2. 

Other  parsers  have  used  variants  of  a "measure"  concept.  Robinson,  [Robinson 
75],  uses  the  term  factor  score  to  refer  to  how  well  various  syntactic  features  "fit" 
together.  In  theory,  this  seems  quite  similar  to  the  syntactic  component  just 
defined.  In  practice,  it  is  used  quite  differently,  since  the  motivation  for  factor 
scores  lies  In  the  ambiguous  inputs  a speech  parser  must  deal  with.  Reader  uses 
the  measure  of  a structure  to  help  It  choose  from  among  completed  parse 
structures,  or  from  among  structures  resulting  from  the  collapse  of  a stack  segment. 
Measure  Is  never  used  to  determine  how  a word  should  be  applied  to  a parse,  or 
whether  or  not  to  continue  a parse.  In  contrast,  factor  scores  are  primarily  used  to 
determine  the  priority  of  active  parse  paths.  The  factor  score  of  "out"  eliminates  a 
parse  path.  An  example  of  an  "out"  factor  score  is  the  combination  of  "foot"  and 
"s".  Presumably,  the  speaker  Intended  the  "s"  as  the  first  letter  of  the  word 
following  "foot",  rather  than  the  last  letter  of  the  Incorrect  plural  "foots"  This  level 
of  detail  Is  unnecessary  In  a parser  Intended  for  written  Input. 

In  many  cases,  the  syntactic  measure  can  be  done  away  with  In  favor  of  more 
efficient  parsing  methods.  In  the  example  above,  syntactic  measure  Is  needed 


A cioser  look 


86 


whenever  the  grammar  "splits"  on  a verb  Inflected  with  "ed"  by  creating  a parse  In 
which  the  "ed"  verb  Is  the  main  verb  of  a clause,  and  one  in  which  the  "ed"  verb  Is 
part  of  an  embedded  clause  modifying  a noun.  In  a grammar  which  did  not  spilt  (see 
section  3.2.5),  "I  saw  the  salesman  crushed  by  the  elevator"  would  be  divided  Into: 

PREP:  BY 

3.  NOUN.  (ELEVATOR  THE) 

VERB  ((CRUSH  ED)) 

N0UN1 : Imatch  to  head_noun 
2 FUNCTION-  REl" 

VERB;  ((SEE  ED)) 

N0UN1 : I 
N0UN2:  (MAN  THE) 
i.  FUNCTION:  MAIN 

When  the  stack  Is  collapsed,  2.  would  be  attached  to  1.  as  the  WHAT  case  of 
"see",  and  !match_to_head_noun  would  be  replaced  by  "the  man".  If  the  sentence 
were  "I  saw  the  man  crushed  by  the  elevator  walk  away.",  then  when  walk  was 
"read",  the  only  place  to  put  It  would  be  the  verb  slot  of  the  WHAT  case  of  "see". 
Therefore  the  stack  would  be  collapsed  with  the  purpose  "VERB",  meaning  "Don’t 
fill  up  any  verb  slots."  This  would  cause  2.  to  be  attached  to  1.  es  a mo^fler  of 
"man",  rather  than  as  the  WHAT  case  of  "see". 


4.2  Collapsing 

Collapsing  a stack  (or  stack  segment)  consists  of  converting  It  into  a single  stack 
structure  by  attaching  all  the  structures  In  the  stack  to  each  other  until  there  is 
only  one  left  that  has  not  been  attached  to  any  other.  The  methods  used  to  build 
the  stack  ensure  that  structures  will  only  modify  structures  beneath  them  In  the 


A closer  look 


87 


stack.  There  Is  one  "syntactic"  constraint  the  collapse  must  satisfy.  Given  a stack 
[Sn,  Sn-1,.  ..S2,  SI],  If  S k Is  attached  to  Sj,  then  for  all  /,  k > / > j.  Si  cannot  be 
attached  to  S m,  j > m.  This  constraint,  which  may  be  viewed  as  nesting  condition, 
reflects  the  syntax  of  English.  As  an  Illustration,  the  stack  [D  C B A]  could  be 
collapsed  In  five  different  ways: 


(A  B C 0) 

(A  B (C  D)) 
(A  (B  C D)) 
(A  (B  (C  D))) 
(A  (B  C)  D) 


A modified  Independently  by  B,  C and  D. 

A modified  Independently  by  B,  and  C modified  by  D. 

A modified  by  B modified  Independently  by  C and  D. 

A modified  by  B modified  by  C modified  by  D. 

A modified  by  independently  by  B modified  by  C,  and  D. 


It  can' t be  collapsed  so  that  D modifies  B,  which  then  modifies  A,  and  C modifies  A 


since  this  would  violate  the  nesting  condition. 


Depending  on  the  stack,  each  one  of  the  above  structures  could  be  the  meaning 
intended  In  the  sentence,  so  the  Collapse  algorithm  must  be  able  to  consider  each 
possible  collapse  and  return  the  one(s)  with  the  best  measure. 


The  following  sentence  Illustrates  the  fact  that  any  one  of  the  five  structures  could 

be  the  preferred  Interpretation  of  a four  structure  stack.  "He  puts  the  block  In  the 

box  In  the  carton  on  the  table."  would  be  divided  Into 

D.  on  the  table 
C.  In  the  carton 
B.  In  the  box 
A.  He  puts  the  block 

Depending  on  the  circumstances  the  sentence  occured  In,  It  cculd  mean  either: 

(A  (B  (C  D)))  --  The  box  Is  In  the  carton,  the  carton  Is  on  the  table,  and  the  block  Is 
put  In  the  box.  [When  B modifies  A,  It  can  modify  either  the  location  of  the  block,  or 
where  t!.e  block  was  put.  If  only  B modifies  t directly,  then  It  must  specify  where 
the  block  was  put.  If  there  Is  another  modifier  that  could  specify  where  the  block 
was  put,  then  B specifies  the  location  of  the  block.] 

(A  (B  C)  D)  --  The  block  Is  In  the  box,  the  box  In  the  carton,  and  the  block  Is  put  on 
the  table. 


A closer  look 


88 


(A  B (C  D))  --  The  block  Is  In  the  box,  the  carton  is  on  the  table,  and  the  block  Is 
put  in  the  carton. 

Changing  D to  "on  Thursday"  yields 

(A  B C D)  --  The  block  is  in  the  box.  It  Is  put  in  the  carton.  The  action  Is  done  on 
Thursday. 

Changing  C tc  "with  the  cover"  yields 

(A  (B  C D))  --  The  box  has  a cover.  The  box  is  on  the  table.  The  block  Is  put  in  the 
box. 

The  simplest  algorithm  for  collapsing  the  stack  would  be  to  generate  ail  legal 
collapses  and  then  choose  one  with  the  best  measure.  This  method  Is  not  used 
because  the  number  of  structures  a stack  can  be  collapsed  to  grows  exponentially 
with  the  length  of  the  stack.  In  fact,  the  sequence  followed  is  the  Catalan2 
sequence,  which  Is  (1,  1,  2,  5,  14,  42,  132,  429,  1430,  4862,  16796...).  .he 
closed  form  for  the  Nth  term  of  the  sequence  is 
( 2(N- 1 ) ) ! 

= The  number  of  ways  a stack  ot  length  N can  be  coiiapsed. 

(N-l)INI 

So  it  is  obvious  that  we  will  want  to  use  a more  intelligent  method  for  collapsing. 

The  set  of  structures  a stack  S may  be  reduced  to  is  caiied  the  collapse  set  We 
wish  to  generate  the  members  of  the  collapse  set  in  an  order  that  gives  us  the  best 
chance  of  finding  the  preferred  structure  in  the  se*  before  generating  the  entire 
set. 

In  English  usage,  sentence  constituents  have  a tendency  to  modify  the  constituents 
that  are  closest  to  them  In  the  sentence,  in  a stack,  this  translates  as  "a  stack 


2 Which,  among  other  things,  counts  the  number  of  ways  a convex  polygon  of  N 
sides  can  be  triangulated  [Gardner  76]. 


A closer  look 


89 


structure  Is  most  likely  to  modify  the  one  directly  beneath  it  in  the  stack."  Our 
heuristic  Is  to  generate  the  members  of  collapse  set  that  have  the  "closest 
modifications"  first3,  and  stop  as  soon  as  we  generate  a structure  with  perfect 
measure. 

We  define  a metric  to  measure  how  well  a member  of  the  collapse  set  fits  the  "close 
modification"  criteria.  The  metric  counts  the  number  of  structures  In  the  stack  that 

modify  structures  n structures  beneath  them.  S(N1,N2 Nfc)  is  the  subset  of 

collapse  set  whose  members  contains  N1  structures  that  jump  over  one  structure  to 
find  the  structure  they  modify,  N2  structures  that  Jump  over  2 structures  to  find  the 
structure  they  modify,  etc.  The  members  of  S(N  I ,N2,...Nk)  are  more  closely  modified 
than  the  members  of  S(M1  ,M2...,Mk)  if  and  only  if  the  sum  of  the  N /'  (/=  1 ,Ac)  is  less 
then  the  sum  of  >he  Ml  (l-'\,k),  or  the  sums  are  equal  and  there  exists  / (1  < j < At) 
such  thci  t N j > My  and  N / = Ml  for  ail  I less  than  k.  eg.,  For  a stack  of  five 
structures,  the  structure  with  the  closest  modifications  Is  £(0,0,0)  the  structures 
that  are  In  S(1,0,0)  are  the  next  most  likely  Inteipretatirn  of  the  stack,  and  he 
structures  in  S(2,0,0,)  are  preferred  over  those  in  S(1,1,0).  Tne  Collapse  routir  i 
generates  the  stiuctures  with  the  closest  modifications  first,  with  one  important 
exception.  Suppose  the  modification  of  structure  N by  structure  M leads  to  a bad 
measure.  Then  every  flnai  structure  In  which  M modifies  some  other  structure  with 
a better  measure  than  it  does  N is  generated  before  those  containing  N modified  by 
M,  even  though  the  latter  may  be  more  closely  modified. 

Here  is  how  this  works  on  the  sentence, 

3 There  are  certain  exceptions:  for  example,  if  a verb  structure  in  the  stack  has  a 
passive  verb  group,  and  there  is  a preposition  structure  whose  preposition  is  "by" 
above  it,  then  the  coliapse  routine  tries  to  attach  the  "by"  preposition  structure  to 
the  verb  structure  first. 


A closer  look 


90 


"Write  me  a program  called  Intersection  which  prints  a set  of  lists 
of  numbers  and  outputs  the  numbers  which  are  in  all  of  them." 


The  stack  tc  be  collapsed  Is, 

PREP:  OF 
9.  NOUN:  THEM 

PREP: IN 
8.  NOUN:  ALL 

VERB:  ((BE)) 

N0UN1:  (WHICH  !PL) 

7.  FUNCTION:  WHICH 

VERB:  ((PRINT  S)) 

N0UN1:  !match_to_conjunct_sub 
N0UN2:  (NUMBER  !PL  THE) 

6.  FUNCTION:  AND 


PREP;  OF 

5.  NOUN:  (NUMBER  !PL) 

PREP:  OF 

4.  NOUN:  (LIST  !PL) 

VERB:  ((READ  . S)) 

N0UN2:  (SET  A) 

NOUNI:  (WHICH  ISING) 

3.  FUNCTION:  WHICH 

VERB:  ((CALL  . ED)) 

N0UN2:  ^INTERSECTION 
NOUNI:  !match_to_head_noun 
2.  FUNC  riON:  PASS 

VERB:  ((WRITE)) 

N0UN3: (PRCGRAM  A) 

N0UN2:  ME 
NOUNI:  YOU* 

MSG:  (IMP) 

1.  FUNCTION:  MAIN 


or  more  simply, 


A closer  ioo< 


91 


9.  of  them 
8.  In  all 
7.  which  are 

6.  and  prints  the  numbers 
5.  of  numbers 
4.  of  lists 

3.  which  reads  a set 
2.  called  Intersection 
1 . write  me  a program 


Collapse  begins  by  trying  to  generate  (1  (2  (3  (4  (5  (6  (7  (8  9)))))))),  the  only 
member  of  S(0, 0,0, 0,0, 0,0).  it  successfully  forms  (6  (7  (8  9)))  and  tries  to  attach 
it  to  6.  It  cannot  since  6.  must  be  attached  to  verb  struc*  _e.  An  iiiegai 
attachment  and  an  attachment  with  bad  measure  are  handled  similarly'1.  Collapse 
now  looks  down  the  stack  for  the  closest  structure  wnlch  wiil  accept  6.  with  a 
perfect  measure.  It  finds  3.  which  means  it  now  has  to  collapse  the  stack  segment 
from  6.  to  3.  it  calls  itself  recursively  on  the  stack  consisting  of  5. ,4.  and  3.  which 
results  in  the  structure  (3  (4  5)).  The  structure  (6  (7  (8  9)))  Is  attached  to  it,  and 
Collapse  goes  back  to  work  on  the  stack  consisting  of  1.,  2.  and  (3  (4  6)  (6  (7  (8 
9)))).  The  result  is, 


eg.,  if  the  attachment  were  legal  but  had  a bad  measure,  Collapse  would 
Immediately  start  looking  for  a better  place  to  put  It.  If  none  were  found,  It  would 
settle  for  the  bad  measure. 


A closer  look 


92 


(IMP  { 2#WR I TE  NN 
[ ARG 1 YOU*] 

[ ARG3  ME] 

[ARG 2 (PROGRAM  A (1NCALL  PN 

[ARG1  !match_to  head  noun] 
[ARG2  ININTERSECTION] 

} 

[CONJ  AND 


UNREAD  NN 

[STEPOF  !match_to_head_noun] 

[ARGS  (SET  A (OF  (LIST~!PL  (OF  (NUMBER  !PL)))))] 


) 


(INOUTPUT  NN 

[STEPOF  (match  to  conjunct  sub] 
CARGS  (NUMBER  THE~iPL  { 2NBE  NN 


[AR61  ( lmat.ch_to  head_noun)] 
[ARG 2 (ALL  (OF  THEM))] 


4.3  Formatting 

Format  is  the  algorithm  which  prepares  a structure  for  output,  it  is  responsible  for 
calling  the  deductive  system  to  measure  the  structure. 

4.3.1  Noun  groups 

The  noun  group  of  an  unformatted  structure  is  a list  of  the  head  noun  and  Its 
modifiers.  This  list  is  handed  to  the  deductive  system  which  structures  it  and 
returns  a measure  of  the  appropriateness  of  the  noun  group.  The  representation 
used  for  the  noun  group’s  structure  Is  dependent  on  the  needs  of  the  deductive 
system.  Suppose  Format  were  given  a structure  containing  the  noun  group, 

NOUN:  (PROGRAM  THEORY  FORMATION  THE) 


A closer  look 


93 


The  deductive  system  would  he  asked  to  structure  It.  The  structure  returned  by 
the  Interpreter  (chapter  5)  would  be: 

NOUN:  PROGRAM 

program-type:  THEORY- FORMAT  I ON 
def imte : T 
MEASURE:  PERFECT 

where  "THEORY-FORMATION"  Is  an  atom  denoting  a certain  kind  of  program. 

The  noun  group  representation  used  by  the  the  deductive  system  does  not  matter 
to  Reader,  since  once  a structure  Is  formatted,  the  parser  no  longer  accesses  It. 
The  Important  piece  of  Information,  as  far  as  Reader  Is  concerned,  Is  the  measure  of 
the  noun  group.  It  Is  not  unreasonable  to  expect  the  deductive  system  to  be 
capable  of  supplying  such  a measure.  A system’s  ability  to  represent  a noun  group 
In  a useful  fashion  Implies  that  It  has  a measure  on  how  well  the  noun  group  fits  the 
representation. 

The  structured  noun  group  Is  returned  In  the  proper  slot  of  Format’s  output.  The 
measure  of  the  loun  group  Is  added  Into  the  structure’s  measure,  which  will  be 
returned  along  with  the  formatted  structure. 

4.3.2  Conjunctions 

Format  Is  responsible  for  bringing  conjunctions  up  to  their  proper  level  In  the 
sentence.  "He  reads  books  and  writes  poetry  and  music"  would  be  parsed  Into 


A closer  look 


94 


NOUN.  MUSIC 
3.  FUNCTION:  AND 

VERB:  ((WRITE  S)) 

NOUNI:  Inatch  to  conjunct  SUB 
NOUN2 : POETRY" 

2.  FUNCTION  AND 

VERB  ((READ  S)) 

N0UN1 : HE 

N0UN2 : (BOOR  ! PL ) 

1 FUNCTION:  MAIN 


When  the  stack  Us  collapsed,  -3.  would  be  attached  to  2.,  yielding 

VERB:  ((WRITE  S)) 

NOUNI : ! match  to_conjunct  SUB 
N0UN2 : (POETRY  (AND  MUSIC)) 

2.  FUNCTION  AND 

When  2.  Is  formatted,  the  conjunction  (which  until  now  has  been  treated  Just  like  a 
preposition)  In  NOUN2  Is  brought  up  to  toplevel,  producing  (AND  POETRY  MUSIC). 
When  the  format  of  2 Is  attached  to  1.,  It  Is  placed  In  the  cases  slot: 


VERB  ((READ  S)) 

N0UN1  : HE 
N0UN2 : (BOOK  PL!) 

CASES:  ((AND  (WRITE  NN  ((SUB  lmatch_to_conjunct_SUB) 

(OBJ  (AND  POETRY  MUSIC)))))) 

1.  FUNCTION  MAIN 


Format  brings  It  up  to  top  level  so  that  the  result  of  the  parse  Is  easily  seen  to  be  a 


conjunction: 


[CONJ  AND 

(READ  NN 

[SUB  HE] 

[OBJ  (BOOK  FPL)] 

) 

(WRITE  NN 

[SUB  (match  to  conjunct  SUB] 
[OBJ  (AND  POETRY 
MUSIC)] 

) 


] 


A closer  look 


es 


The  symbol  "!match_to_conjunct_SUB"  (section  2,4.6)  refers  to  the  SUB  of  the  first 
conjunct  ("he'1). 


4.3.3  Filling  In  extra  cases 

Format  provides  a channel  for  the  deductive  system  to  determine  If  there  are  any 

missing  cases  In  the  verb  that  can  be  filled  In  from  the  rest  of  the  sentence. 

Consider  the  sentence  "John  drove  through  and  destroyed  the  plate  glass  window.", 

taken  from  [Woods  73J.  Syntactically,  It  Is  possible  for  the  object  of  the 

preposition  "through"  to  be  "the  plate  glass  window."  Reader  asks  the  deductive 

system  If  this  would  make  sense.  If  the  answer  Is  affirmative,  Format  would  return 

(CONJ  AND 
(DRIVE  PN 

(SUB  JOHN] 

[WHERE  (THROUGH  (WINDOW  THE  PLATE  GLASS))] 

(DESTROY  PN 

[SUB  lmatch_to_conjunct_SUB] 

[OBJ  Imatch  to  conjunct_PREP] 

> " 

] 

where  "match_to_conJunct_PREP"  Is  to  be  matched  to  "the  plate  glass  window". 
Notice  that  Reader  cannot  add  cases  to  a verb  without  consulting  the  deductive 
system.  In  the  sentence  "John  drove  through  and  destroyed  her  confidence  In 
him.",  the  object  of  "through"  is  not  "her  confidence  In  him". 

4.3.4  Choices 

Any  choices  In  the  parse  structure  (section  2.3.4)  are  generated  In  Format. 
Consider  the  choice  offered  for  the  SUB  of  "be"  In 


A closer  looK 


96 


{KNOW  PN  "I  know  that  Ice  Is  slippery." 

tSUB.IJ 
{WHAT  I BE  PN 

{SUB  (mCHOICE  ICE 

(ICE  THAT) 

)] 

IDES  SLIPPERY) 

I) 

I 


Just  before  Format  asks  the  deductive  system  to  structure  a noun  it  examines  it  to 
see  if  a choice  can  be  made  from  it.  in  this  case,  the  test  tht  t succeeds  is  that 
the  noun  Is  modified  by  "that"  and  is  the  SUB  of  a verb  which  belongs  to  u structure 
whose  function  Is  WHAT.  The  consequence  of  the  test  Is  that  a choice  of  noun 
groups  should  be  offered,  one  with  "that"  cs  a modifier,  and  one  without  "that"  if 
the  original  sentence  had  been  "1  know  that  that  ice  is  slippery",  the  second  "that" 
would  not  have  been  added  to  the  Modifier  List.  Instead,  a message  would  have 
been  left  In  the  message  slot  of  the  verb  structure  which  would  have  signalled 
Format  not  to  test  for  this  particular  choice  being  present. 


4.4  Parallel  processing 

Reader  Is  designed  to  follow  partial  parses  in  parallel,  if  this  were  implemented 
straightforwardly,  It  would  lead  to  an  unfortunate  amount  of  duplicated  effort. 
Consider  the  parsing  of  the  sentence  "He  had  anotner  look  at  the  man  in  the  trench 
coat  who  had  been  following  him  for  the  last  hour."  When  "at"  Is  read  there  are  two 


partial  parses: 


A closer  look 


97 


1.  msg  « NOUN,  ml  « NIL 

VERB:  ((HAVE  ED) ) 

NOUN  1 : HE 

NOUN2 : (LOOK  ANOTHER) 
FUNCTION:  MAIN 


2.  msg  » VERB,  ml  = NIL 

VERB:  (I LOOK)) 
NOUNI : ANOTHER 
FUNCTIC1':  WHAT 

VERB'  ((HAVE  ED)) 
NOUNI  HE 
FUNCTION:  MAIN 


If  reader  used  simple  parallel  processing, 


parses,  producing 


1 msg  * PREr,  ml  = NIL  2. 

PREP:  AT 

VERB:  ((HAVE  EO)) 

NOUNI : HE 

NOUN2 : (LOOK  ANOTHER) 

FUNCTION:  MAIN 


"at"  would  be  added  to  both  partial 


msg  = PEEP,  ml  1 NIL 

PREP:  AT 

VERB  ((LOOK)) 

NOUN  1 ANOTHER 
FUNCTION:  WHAT 

VERB:  ((HAVE  ED)) 
N0UN1 : HE 
FUNCTION:  MAIN 


At  this  point,  both  stacks  have  the  same  top  structure.  The  rest  of  the  sentence, 
consisting  of  the  noun  group  "the  man  In  the  trench  coat  who  had  been  following  him 


for  the  last  hour"  Is  going  to  bo  persed  twice,  once  for  each  partial  parse.  The 
different  partial  parses  arose  because  words  were  applied  to  a single  partial  parse 
In  different  ways.  This  necessitated  two  different  parses,  because  each  could 
accept  words  differently.  Parse  2.  was  able  to  accept  "look"  as  a verb  and  parse 
1.  was  able  to  accept  It  as  a noun.  But  now  that  the  stacks  of  each  partial  parse 
have  the  same  top  structure,  most  words  will  be  added  to  the  stacks  In  the  same 
fashion.  We  can  take  advantage  of  this  fact  to  avoid  parsing  the  object  of  "at" 


twice. 


In  general,  whenever  two  (or  more)  partial  parses  have  Identical  top  structures, 


A closer  look 


98 


they  are  merged  Into  one  partial  parse  with  a branching  stack.  The  two  partial 
parses  above  would  be  merged  to: 
msg  -•  PREP,  ml  = NIL 


PREP:  AT 


VERB  ((HAVE  EOI) 

N0UN1 : HE 

N0UN2:  (LOOK  ANOTHER  I 
FUNCTION:  MAIN 


~L 

VERB:  ((LOOK)) 
N0UN1 : ANOTHER 
FUNCTION:  UHAT 


VERB:  ((HAVE  ED)) 
N0UN1 : HE 
FUNCTION:  flAIN 


The  stack  branching  Is  Invisible  tc  the  grammar  programs.  When  a SHIFT  Is  called  on 
a branched  stack,  It  automatically  follows  down  all  the  branches  and  separates  the 
branched  stack  as  required.  In  this  case,  the  merge  of  the  two  partial  parses  cuts 
the  parsing  time  for  the  rest  of  the  sentence  in  half.  The  succeeding  words  In  the 
sentence  are  applied  to  one  partial  parse,  Instead  of  two.  Since  none  of  the  words 
in  the  remainder  of  the  sentence  are  attached  to  structures  below  the  current  top 
of  the  stack,  the  two  partial  parses  remain  merged  until  the  end  of  the  sentence. 
After  the  last  word  In  the  sentence  has  been  read,  the  stack  looks  like: 


A Closer  look 


89 


msg  = NOUN,  m)  = NIL 


PREPi  FOR 

NOUNi  (HOUR  LAST  THE) 

VERB: ~( (FOLLOU  I NG) (BEEN) (HAVE  ED)) 
NOUNI:  WHO 
NOUN2:  HIM 
FUNCTION:  UHO 

PREP:  IN 

NOUN:  (COAT  TRENCH  THE) 

PREP:  AT 
NOUN:  (MAN  THE) 


VERB  ((HAVE  EOl) 
NOUNI:  HE 

NOUN?:  (LOOK  ANOTHER) 
FUND  ION:  MAIN 


VERB:  ((LOOK)) 
NOUNI:  ANOTHER 
FUNCTION:  UHAT 

VERB : ~ ( (HAVE  ED)) 
NOUNI:  HE 
FUNCTION:  NAIN 


Collapsing  this  stack  produces  two  different  parses-. 


(HAVE  PN 

[SUB  HE] 

[OBJ  (LOOK  ANOTHER  (AT  (MAN  THE  (IN  (COAT  THE  TRENCH)) 

(FOLLOW  PPC 

[SUB  Imatch  to  head_noun] 
[OBJ  HIM! 

[FOR  (FOR  (HOUR  THE  LAST))] 

))>)] 

> 

and 

(HAVE  PN 

[SUB  HE] 

[WHAT  (LOOK  NN 

[SUB  ANOTHER] 

[AT  (AT  (MAN  THE  (IN  (COAT  THE  TRENCH)) 

(FOLLOW  PPC 

[SUB  lmatch_to  head  noun] 

[OBJ  HIM] 

[FOR  (FOR  (HOUR  THE  LAST))] 

>))] 


l 


A closer  look 


100 


Merging  partial  parses  Is  the  other  complication  mentioned  In  the  general  control 
structure  presented  In  section  2.2.  Step  6 was  "Reset  partial- parse-list  to  a list  of 
the  partial  parses  formed  In  step  4."  What  actually  occurs,  Is  that  Reader 
examines  the  list  of  oartial  parses  formed  In  step  4.  and  modifies  it  by  merging  any 
partial  parses  whose  stacks  have  the  same  top  structure,  partial-parse-llst  is  then 
reset  to  the  modified  list. 

The  merging  of  partial  parses  Is  similar  (in  effect)  to  the  use  of  a well-formed 
substring  table  (WFST)  by  parsers  which  use  backup  to  achieve  non-determinism 
rather  than  parallel  processing  A well-formed  substring  table,  [Kuno  63],  Is  a 
collection  of  parsed  sentence  constituents  When  a parser  using  a WhST  backs  up, 
It  avoids  reparsing  sentences  constituents  by  picking  constituents  it  has  already 
parsed  out  of  the  WFS1.  Similarly,  in  a parallel  processing  environment,  the  merging 
of  partial  parses  avoids  the  reparsing  of  constituents  by  allowing  each  parsed 
constituent  to  be  shared  by  every  active  partial  parse  which  can  use  !t. 


4.5  Other  parsers 

A considerable  amount  of  the  work  has  been  done  In  the  field  of  natural  language 
parsing.  Much  of  this  work  has  concentrated  on  syntax  based  parsers.  These  have 
evolved  from  simple  systems  Implementing  context  free  grammars,  to  rather 
complex  systems  motivated  by  transformational  grammar  considerations.  Such 
parsers  have  grammars  which  consist  of  a context  free  grammar,  along  with  a set  of 
rules  for  modifying  the  parse  tree  built  by  the  context  free  component.  The  parse 
tree  may  be  modified  while  it  Is  being  constructed  [Woods  73],  or  after  It  has  been 


A closer  look 


101 


completed  [Sager  73]  This  section  examines  the  differences  between  some  of 
these  systems  and  Reader. 

Reader's  organization  Is  similar  to  these  systems  In  that  we  can  view  Format  as  the 
transformational  component,  and  the  grammar  programs  as  the  context  free 
component.  The  differences  In  the  systems  lie  primarily  In  the  "context  free" 
component.  The  first  difference  Is  mat  the  grammar  programs  are  more  powerful 
than  a context  free  grammar.  Consldor  the  sentence  "Only  one  man  was  found  who 
could  speak  English."  In  this  sentence,  "who  could  speak  English"  modifies  "man". 
Reader  parses  the  sentence  by  dividing  It  into  a stack  of  two  structures.  When  the 
stack  Is  Collapsed,  the  top  structure  Is  attached  to  the  bottom  structure,  which 
results  In  the  proper  modification.  This  modification  cannot  be  expressed  In  a 
strictly  context  free  grammar. 

A more  Important  difference  lies  In  the  way  the  "context  free"  component  operates. 
The  grammars  for  most  syntax  based  parsers  consist  of  a description  of  legal 
sentence  structures.  The  grammar’s  application  to  a sentence  results  In  a series 
of  choices  about  which  kind  of  constituent  should  be  built  at  a particular  point  in  the 
parse.  Each  system  makes  some  effort  to  diminish  the  number  of  unsuccessful 
guesses.  For  example,  Woods  allows  the  grammar  writer  to  "recommend"  what 
guess  to  make  at  any  point  In  the  parse.  Wlnograd’s  grammar5attempts  to  use  the 
Information  gained  from  a failed  guess  at  a decision  point  to  allow  It  to  choose 
Intelligently  from  the  remaining  choices  at  the  decision  point. 


6 The  grammar  In  Wlnograd's  parser  also  consists  of  a set  of  programs.  However, 
the  programs  deal  solely  with  the  construction  of  a parse  tree,  and  are  not  orlentad 
towards  building  structures  that  can  represent  -.lore  than  one  parse  tree  at  a time. 


A closer  look 


102 


Reader’s  grammar  consists  of  a set  of  programs  which  determine  the  different 
ways  a word  may  be  added  to  a parse  In  a given  configuration.  The  two  methods 
are  similar  In  that  the  gasses  the  older  parsers  make  correspond  to  the  guesses 
Reader  must  make  !n  deciding  which  way  add  a word  to  a partial  parse.  The 
difference  in  the  methods  Is  that  Reade;  provides  a framework  (the  stack)  and  a 
means  (the  grammar  pograms)  for  writing  grammars  that  diminish  the  number  of 
ways  a word  can  be  applied  to  a partial  parse  whiie  still  maintaining  a substantial 
grammar.  In  most  case s the  grammar  programs  will  apply  a word  class  to  a parse  in 
only  one  way.  However,  a word  which  belongs  to  more  than  c^e  word  class  will 
generally6  be  applied  to  a parse  once  for  each  word  class  it  belongs  to. 

it  can  be  argued  thut  since  ail  the  more  recent  systems  have  the  power  of  Turing 
machines,  they  can  perform  any  algorithm,  including  those  tha^  Reader  carries  out. 
A simple  answer  to  this  is  "Ah,  but  they  don’t".  The  reason  they  don’t  Is  that  In 
many  oT  the  systems  the  "full  power  of  a Turing  machine"  is  used  only  to  mod.  as 
opposed  to  help  build,  the  parse  trees  generated  by  the  context  free  component, 
in  other  words,  the  Turing  machine  comes  in  after  aii  the  guessing  has  been  done. 

The  methods  used  by  Reader  to  avoid  nondeterminism  include  a mechanism  used  in 
the  ATN  parser  described  in  [Woods  1970].  Wood' s parser  Is  partially  based  on  a 
finite  state  machine,  and  the  method  referred  to  involved  the  technique  of  making 
an  arbitrary  nondetermlnlstic  finite  state  machine  determirrstic  by  introducing 
several  new  states.  Some  of  Reader’s  stategies  can  be  viewed  in  this  light,  but 
most  cannot,  since  they  are  Involved  with  eliminating  nondetei  inlsm  from  situations 
which  Involve  pushdown  operations  in  the  ATN  formalism. 

6 Exceptions  are  single  applications  for  words  which  are  both  conjunctions  and 
prepositions,  and  words  which  are  both  nouns  and  modifiers. 


A closer  look 


103 


Here  Is  a concrete  example.  Section  3.2.2  explains  how  Reader  parses  simple 
relative  clauses  deterministically,  using  the  example  sentence  "The  city  people 
hate  is  Tokyo".  A nondetermlnlstlc  ATN  would  begin  parsing  the  sentence  by 
attempting  to  find  a noun  phrase.  It  would  have  to  guess  whether  to  find  "the  city 
people"  cr  "the  city  people  hate".  The  guess  consists  of  deciding  when  to  "pop" 
up  from  the  "push"  of  finding  a noun  phrase;  exactly  the  kind  of  guess  that  a finite 
state  machine  transformation  cannot  help. 

Another  advantage  listed  for  ATNs  Is  the  use  of  registers  to  make  "...tentative 
decisions  about  the  sentence  structure  and  then  change  one’s  mind  later  In  the 
sentence  without  backtracking."  This  Is  obviously  a good  feature  for  a par-  ?r  ti 
have,  and  seems  equivalent  to  Reader’s  method  of  representing  both  sides  of  a 
decision  while  reserving  the  rlc,’it  to  chose  one  or  the  other  (without  backtracking) 
later  in  the  sentence.  In  Reader,  this  allows  one  to  parse  relative  clauses  and 
conjunctions  deterministically,  delay  attaching  various  parse  structures  until  more 
information  Is  gathered  about  the  reason  for  the  attachment  (thereby  reducing  the 
combinatorics  of  the  attachment),  combine  different  word  class  usages  of  a single 
word  Into  one  parse,  etc.  In  contrast,  [Woods  1970]  contains  two  examples  of  the 
ten- a live  decision  method  at  work,  which  occur  In  the  parsing  of  the  sentence 
"John  was  believed  to  have  been  shot."  The  first  decision  Is  that  was  Is  the  main 
verb  of  the  sentence,  which  Is  later  revised  to  believe  Is  the  main  verb  and  was  Is 
an  auxiliary  verb.  The  second  is  ths  decision  that  John  Is  the  subject  of  was, 
revised  later  to  John  Is  the  object  of  believe,  and  revised  still  later  to  John  Is  the 
object  of  shot.  In  Reader's  formalism,  all  these  "decisions"  are  made  and  revised 
trivially.  The  final  stack  to  collapse  is: 


A closer  look 


104 


VERB:  ((SHOOT  £0 )( BEE N )( HAVE ) ) 
NOUNI : imatch  to  sub 
2.  FUNCTION:  INF~ 

VERB:  ((BELIEVE  ED)( BE  BE 3SP ) ) 
NOUNI : JOHN 
1.  FUNCTION:  MAIN 


The  decision  to  make  was  a helping  verb  Is  accomplished  by  simply  adding  bellevsd 
to  structure  1.  There  is  no  need  to  assume  what  case  John  fills  until  the  structure 
it  is  '■  Is  Formatted.  Attaching  an  INF  structure  whose  VERB  Is  passive  to  a 
structure  with  a passive  verb  which  accepts  a clause  entaiis  removing  the  first 
noun  In  the  latter  structure,  Installing  it  as  the  first  noun  of  the  INF  structure,  and 
then  attaching  the  INF  structure  as  the  ciause  case.  When  the  INF  structure  Is 
Formatted,  "Joi.n"  Is  made  the  object  of  "shot".  The  perse  Is, 


(BELIEVE  PN 
[WHAT 


(SHOOT  NP 

[OBJ  JOHN] 

>] 


There  Is  at  least  one  other  parser  under  development  that  also  tries  to  avoid 
needless  guessing,  it  is  being  written  by  Marcus  [Marcus  75]  and  Is  based  In  the 
belief  that  "...the  structure  of  natural  language  provides  enough  and  the  right 
information  to  determine  exactly  what  to  do  next  at  each  point  of  the  parse."  The 
claim  is  that  the  parser  will  be  able  to  avoid  guessing  what  to  do  at  a decision  point 
because  there  is  really  only  one  acceptable  choice.  The  system  Is  still  being 
written,  so  it  Is  too  eariy  to  comment  on  it.  However,  It  seems  that  this  approach 
will  encounter  problems  when  working  with  a sufficiently  large  grammar  and  words 
that  can  assume  more  than  one  syntactic  category. 


A closer  look 


106 


Some  more  recent  parsing  systems  have  been  developed  which  deemphasl2e  the 
role  that  syntax  plays  in  the  parsing  process.  Naturally,  such  parsers  do  not 
produce  a "classical"  parse  tree,  but  Instead  produce  a structure  which  Is  said  to 
represent  the  "meaning"  of  the  sentence  being  parsed.  Examples  of  this  type  of 
work  may  be  found  in  [Riesbeck  74]  and  [Wilks  73].  As  this  work  has  come  after 
the  more  syntax  oriented  parsers  discussed  above,  we  should  explain  why  we  have 
rejected  this  approach. 

The  main  reason  Is  our  belief  that  most  semantic  processing  will  be  more  expensive 
than  syntactic  processing  In  a rich  environment.  Therefore,  It  Is  desirable  to  use 
syntax  to  minimize  the  number  of  semantic  Interactions  that  need  be  considered. 
This  contrasts  with  (for  example)  Riesbeck' s work,  In  which  he  says  "the  functions 
of  the  analyzer  to  be  described  here  ask  questions  about  the  relationship  of  words 
and  concepts."  Here,  the  process  has  been  reversed;  semantics  and  deduction  are 
used  to  determine  which  words  Interact,  and  syntax  Is  used  only  later,  If  at  all,  to 
ensure  that  a proposed  modification  between  words  Is  permitted.  If  one  limits 
oneself  to  simple  sentences,  the  added  expense  of  using  semantics  Instead  of 
syntax  to  decide  whether  two  words  Interact  will  not  be  overwhelming,  since  the 
possible  Interactions  In  a simple  sentence  will  be  few  In  number.  However,  the 
number  of  possible  Interactions  to  be  examined  semantically  grows  exponential 
with  the  complexity  of  the  sentence,  so  It  seems  that  these  methods  will  not  be 
practical  In  a rich  environment  (in  which  there  are  many  possible  relationships 
between  almost  all  words  and  concepts)  which  has  to  deal  with  complicated 


sentences. 


106 


6.  The  interpreter 

A brief  overview  of  the  Interpreter  is  given  in  sections  1.2.2  and  1.6.2.  Essentially, 
It  is  a computer  program  which  attempts  to  understand  natural  language.  There  are 
many  other  computer  systems  which  wouid  make  the  same  claim.  The  points  of 
Interest  In  ail  programs  of  this  type  are: 

1.  The  representation  used  for  the  information  contained  In  the  natural 
language.  For  the  interpreter,  this  Is  the  program  specification. 

2.  The  representation(s)  used  for  the  knowledge  base  needed  to 
understand  the  natural  language. 

3.  The  methods  used  for  activating  parts  of  the  knowledge  base  to  bear 
on  a particular  task. 

The  tirst  point  Is  covered  in  Section  5.1.  Examples  of  different  types  of  program 
specification  types  ore  given,  along  with  an  example  which  illustrates  how  several 
components  fit  together  to  describe  a computer  program.  The  section  also 
discusses  the  representation  of  user’s  replies  whicii  are  not  Incorporated  Into  tne 
program  specification. 

Section  5.2  Introduces  "concepts"  and  "definitions",  the  two  representation  units  in 
the  Interpreter’s  knowledge  base.  The  simplest  type  of  concepts  are  those  which 
aie  abstractions  of  components  in  the  specification.  An  example  of  such  a concept 
Is  ffADD,  which  refers  to  the  concept  of  adding  up  several  numbers.  Information 
Included  In  the  ffADD  concept  Is, 

ffADD  can  be  Instantiated  as  a step  in  the  program  specification. 
ffADD  takes  two  or  more  arguments. 

The  arguments  should  be  numbers.  But  an  exception  occurs  when 
there  Is  one  argument  which  Is  a set  of  numbers.  In  that  case, 
the  numbers  In  the  set  should  be  considered  the  arguments  of  the 
ffADD. 


The  Interpreter 


107 


t 

f 


Definitions  provide  Instructions  for  mapping  English  word  strings  Into  concepts.  The 
definition  of  "sum"  contains  Information  which  allows  the  Interpreter  to  map  "The 
program  sums  up  the  last  three  numbers."  Into  an  #ADD  which  is  a step  of  "the 
program"  and  whose  arguments  are  "the  last  three  numbers". 

The  task  of  relating  a phrase  like  "the  last  three  numbers"  to  a specific  component 
(or  components)  In  the  program  specification  Is  referred  to  as  matching.  Section 
5.4  covers  the  matching  process,  explaining  how  th9  Information  contained  In 
concepts  and  definitions  is  used  during  matching. 

The  primary  goals  of  the  processing  performed  by  the  Interpreter  are  conceptually 
very  simple,  and  sections  5.2,  5.4  and  6.3  (which  explains  the  Interpreter’s 
processing  cycle  to  provide  background  for  section  5.4)  should  be  read  with  them  In 
mind.  The  goals,  upon  receiving  a parse  structure,  are: 

1.  Determine  which  definitions  can  be  applied  to  the  parse 
structure,  and  therefore  which  concepts  the  parse  structure  Is 
Invoking. 

2.  Find  or  create  referents  In  the  program  specification  for  the 
descriptor  slots  of  the  concepts  the  parse  has  been  reduced  to. 

3.  incorporate  the  appropriate  concepts  into  the  program 
specification. 

Section  5.5  explains  how  definitions  and  concepts  are  used  to  provide  the  measure 
information  necessary  for  the  interface  between  Reader  and  the  Interpreter.  The 
final  section  mentions  some  work  remaining  to  be  done. 


The  Interpreter 


108 


5.1  The  results  of  interpretation 

5.1.1  The  program  specification 

The  program  specification  contains  a record  of  everything  the  user  has  said  (and 
the  Interpreter  has  Inferred)  which  Is  relevant  to  the  description  of  the  program 
being  written.  The  parser/interpreter  uses  it  as  a data  base  for  matching,  the 
parser/interpreter  Interface  etc.  This  section  describes  the  format  of  the 
specification.  Later  sections  wiii  show  how  it  is  utilized  by  the  parser/interpreter. 

The  principal  result  of  the  Interpreter  Is  the  program  specification  The  program 
specification1  represents  a computer  program,  and  can  be  viewed  as  a high  level 
programming  program  language.  It  consists  of  a connected  set  of  components. 
Such  a data  structure  has  been  labeled  a "entlty-attribute-vaiue  data  structure"  in 
[Heidorn  74],  and  a "set  of  conceptual  entities  with  associated  descriptions"  in 
[Bobrow  76]. 

The  description  of  a component  Is  a collection  of  descriptor/value  pairs  which 

specify  the  actions  and  structure  of  the  component.  For  example,  a component  may 

have  as  Its  description, 

A0358 
class:  ALG 
type:  OUTPUT 
args: "Ready" 
step-of:  A0367 

which  means  that  It  Is  an  Algorithm  component  that  should  be  mapped  into  an 
"output"  operation  In  the  target  language  (eg.,  WRITE  in  Fortran,  PRINT  in  Lisp,  etc.). 


1 The  program  specification  semantics  were  developed  with  Jorge  Phliiips. 


The  Interpreter 


109 


The  argument  of  the  output  Is  the  string  "Ready".  The  step-of  descriptor  Indicates 
the  position  of  the  component  In  the  specification;  It  Is  one  of  the  steps  of  an 
ALGorlthm  component  denoted  by  A0367. 

Each  descriptor  has  an  Inverse  associated  with  It.  For  example,  If  a component  X Is 
In  the  steps  descriptor  of  a component  Y,  this  fact  can  be  derived  by  examining 
either  X or  Y. 

A component  belongs  to  one  of  two  classes:  ALGorlthm  or  DATA.  Each  class  Is 
subdivided  Into  several  types.  Figure  5.1  shows  some  control  structure  ALGORITHM 
type3. 


The  Interpreter 


110 


PROCEDURE 

ARGS:  a list  of  DATA  components  whose  type  is  BOUND. 

DEFINITION:  An  ALG  component. 

SEQ 

STEPS:  a list  of  ALGe  to  be  executed  in  sequential  order. 

CASE 

CONDITION: 

an  ALG  with  a RESULT  slot,  or  a DATA  which  ie  the  RESULT 
of  an  ALG. 

STEPS:  a list  of  ALGS  to  bs  executed  if  the  CONDITION  is  TRUE. 

CONO 

CASES: 

a list  of  ALGS  whose  type  is  CASE.  The  first  CASE  whoss  condition 
is  TRUE  is  executed,  the  rset  are  ignored. 

ENUMERATE 

ON:  a OATA  whose  type  ie  SET. 

STEPS: 

a list  of  ALGS  to  be  executed  sequentially  for  each  element  in 
the  ON  set.  The  iteration  element  is  represented  by  the  generic 
element  of  the  ON  set. 

LOOP 

EXITS:  a list  of  ALGS  whose  type  ie  CASE. 

COUNTER: 

a CAT  A of  type  INTEGER  whose  value  ie  the  number  of  timBS 
the  LOOP  has  been  executed. 

STEPS: 

a list  of  ALGs  which  includes  every  CASE  in  EXITS.  The  ALGs  in 
STEPS  are  repeatedly  executed  until  the  condition  of  a CASE  in 
EXITS  is  satisfied. 

CALL 

PROCEDURE:  on  ALG  of  type  PROCEDURE. 

ARGS:  a list  of  OATAe  which  are  bound  to  the  args  of  PROCEDURE. 


Figure  5.1 

Control  structure  ALGorithm  types 


The  remaining  ALGorithm  types  can  be  divided  Into  predicates  and  primitive 


The  Interpreter 


111 


operations.  The  number  of  these  Is  essentially  unlimited,  since  anything  the  PSI 
coding  module  can  code  can  without  Instructions  from  the  user  can  be  considered 
primitive.  Figure  6.2  provides  some  samples  of  * e primitive  operations  and 
predicates  used  by  the  current  system. 


MAP 

ARG1:  e DATA  componentn  whose  type  la  MAPPING. 

ARG2:  a DATA  component 
ARG2:  e DATA  component 

MAP  is  the  program  specification  primitive  for  associating  one  DATA  component 
(ARG2)  with  enother  (ARG3)  vie  the  mapping  ARG1.  It  is  e genBralizetion  of 
the  Lisp  PUTPRDP  command.  IMAP  corresponds  to  GETPRDP. 

IMAP 

ARG1:  a DATA  components  whoea  type  ia  MAPPING. 

ARG2:  e DATA  component 

RESULT:  the  DATA  component  that  ARG1  mapa  ARG2  to. 

CDMPUTE 

DN:  a DATA  component  which  is  e eet. 

RESULT:  e DATA  component  which  ie  a eat. 

QUANTIFY:  either  ALL,  SDME  or  DATA  component  which  ie  an  integer. 
ASSERTIONS: 

e list  of  ALGb  which  are  assertions  involving  the  generic 
element  of  the  RESULT  eat. 

The  RESULT  set  is  a subset  of  the  DN  eet  which  consists  of  ail,  some  or  any 
n (depending  on  the  value  of  QUANTIFY:  ALL,  SDME  or  a number  n)  of  the 
of  the  elements  in  tha  DN  eet  which  satiety  the  ASSERTION  Mat. 

INPUT 

ARGS:  e list  of  the  DATAa  being  read  in. 

PROMPT:  a DATA  of  type  STRING  which  ia  output  to  herald  tha  INPUT. 


MEMBER 

ARG1:  a DATA  component. 

ARG2:  a DATA  component  which  ie  a SET. 

RESULT:  e DATA  of  type  BDDLEAN  which  reflects  whether  ARG1  is  in  ARG2. 
FDRALL 

BINDINGS:  s Met  of  DATAe  whose  type  ie  BDUND. 

PREDICATE:  an  ALG  with  a RESULT. 

RESULT:  a BDDLEAN  which  ie  the  truth  value  of  universal  quantification. 


Figure  5.2 

Primitive  operations  and  predicates 


Data  structures,  like  primitive  operations,  come  In  any  form  that  the  coder  Is  able  to 
handle.  Figure  6.3  shows  some  DATA  types  and  example  DATAs. 


The  Interpreter 


112 


BET 

ELEMENT:  a OATA  which  is  the  generic  element  of  the  eet. 

RECORO 

FIELOS:  a liet  of  OATA  components  whoee  type  is  FIELD. 

FIELO 

OATA:  a OATA  component  which  the  contents  of  NAME  filed  of  a RECORD. 
NAME:  the  name  of  the  FIELO. 

QUANTIFY:  either  ALL,  SOME  or  OATA  component  which  is  an  integer. 


class  OATA 

type  SET  [the  empty  eet] 

value  PHI 

class  OAT  A 

type  BOOLEAN  [the  boolean  value  TRUE] 

value  TRUE 

class  OATA 
type  RECORD 
rep  GRAPH 
instanceof  AOOOI 
assertione  (A0002  A0003) 

The  DATA  above  illustrates  the  three  descriptor  any  OATA  may  ha"B. 

The  REP  descriptor  indicates  that  the  proornm  designer  is  referring 
to  this  component  by  the  word  "grapi  i . ALG  componente  may  bIbo  have  • 
REP  descriptors.  The  IN5TANCLOF  descriptor  indicates  that  the 
structure  of  this  component  is  the  same  as  the  structure  of  the 
component  which  AOOOI  points  to.  The  ASSERTIONS  descriptor  contains 
a list  of  assertions  about  the  component. 


Figure  5.3 

Oata  structure  types  Bnd  examples 


5.1.2  An  example  and  comparison 

This  section  Illustrates  how  these  pieces  are  combined  In  a program  description. 
Figure  5 4 contains  a short  dialogue,  the  program  specification  the  Interpreter  has 
built  from  It,  and  the  pretty  printed  version  of  the  specification. 


WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE? 

Lessall. 

DESCRIBE  LESSALL. 

LB9sall  takes  a number  and  a liet  of  numbers  as  arguments.  It  returns  True  if 
the  number  is  less  than  every  number  in  the  liet.  Otherwise  it  returne  False. 


* type  PROCEDURE 

name  *< 

definition  *< — i 

args  * * i 


* type  BOUND  * type  BOUND 
boundto  * t boundto  * 


* type  COND 
cases  *«-♦* 


* type  NAME 
value  LESSALL 


t . ■ mJ  | ^ 

| * type  CASE  * type  CASE 

type  NUMBER  * type  LIST  condition  * condition  TRUE 

element  * steps  * f steps  * 


* type  NUMBER  * type  RETURN  * type  RETURN 

t argl  TRUE  argl  FALSE 

I I 

| * type  FORALL 

— ►*  type  BOUNO  predicate  *«-♦*  type  IMPLIES 

t boundto  *< — bindings  * antecedent  * 

t consequent  * 


* type  MEMBER 
" element  *« ' 

* type  LESS 

set  **-] 

argl  *« 1 

arg2  *«-. 

If  FORALL  (B3)  1MPL IES(MEMBER(B3  B2) 

LESS(B1  B3) ) 

Then  RETURNf TRUE ) 
else  RETURN(FALSE) 

B3  Is  a variable  bound  to  Al.  B2  Is  a variable  bound  to  A 2. 

B1  Is  a variable  bound  to  A3.  A3  Is  a number.  Al  Is  the  generic  element  of  A 2. 
A2  Is  a list  whose  generic  element  Is  a number. 


Figuro  5.4 

Leesall  end  ite  program  specification 


The  Interpreter 


114 


The  top  node  In  the  program  specification  Is  always  a PROCEDURE  component.  In 
this  case,  It  has  two  arguments,  which  are  bound  to  a number  and  list  of  numbers 
respectively.  This  structure  Information  is  required  by  the  coder,  as  It  enables  It 
choose  its  algorithm  based  on  the  data  structures  the  algorithm  Is  meant  to 
manipulate.  The  body  of  the  procedure  is  a COND  with  two  cases.  If  the  condition 
("VX  X c B2  ■>  B1  < X",  where  B1  Is  the  number  and  B2  the  number  list)  Is  True, 
then  True2  Is  returned.  If  not,  the  STEPS  slot  is  Ignored  and  the  second  CASE  Is 
tried.  The  condition  of  the  second  case  Is  T-ue,  so  anytime  the  first  condition  does 
not  obtain,  False  will  be  returned.  The  control  structure  and  data  descriptions 
beneath  the  specification  diagram  are  the  distillation  (as  obtained  by  the 
specification  pretty  printer)  of  the  program  description  Information  contained  In  the 
diagram. 

The  l ESSALL  program  was  taken  from  a paper  on  the  Dedalus  system,  [MANNA  77]. 
Dedalus  Is  an  automatic  program  synthesis  which  uses  a formal  specification 
language  as  its  Input,  rather  than  English.  Since  the  Interpreter’s  output 
corresponds  to  the  input  of  such  a system,  a comparison  between  the  two  is  a 
useful  measure  of  the  effectiveness  of  the  Interpreter.  In  this  case,  the  two  are 
virtually  Identical:  the  Dedalus  input  for  LESSALL  Is 

LESSALKX  L)  <==  compute  X < all(i) 

where  X Is  a number  and  L Is  a list  of  numbers. 

The  expression  X < a//(L)  means  that  "...X  Is  less  than  every  member  of  the  list  L." 


2 To  save  space,  TRUE  and  FALSE  have  been  used  to  represent  the  BOOLEAN 
components  whose  values  are  TRUE  and  FALSE. 


The  Interpreter 


116 


6.1.3  Meta-comments 

Some  of  the  program  designer’s  Instructions  to  the  system  do  not  describe  the 
program,  but  instead  are  intended  towards  directing  the  course  of  the  dialogue. 
Comments  i*ke, 

I don’ t understand. 

What  we  were  talking  about? 

What  did  you  mean  by  "the  predicate  fits"? 

Forget  about  prompts. 

do  not  fit  into  the  program  specification,  but  are  meaningful  nonetheless.  Such 
statements  are  sent  to  the  dialogue  expert  as  a filled  In  case  frame.  The  case 
frame  is  actually  a concept  (next  section)  and  it  Is  fliied  in  In  exactly  the  same  way 
that  concepts  are  Instantiated.  The  only  difference  is  that  Instead  of  being  added 
to  the  program  specification,  the  instantiated  concept  Is  sent  to  the  PSI  dialogue 
module  for  processing. 

As  an  example,  we  will  examine  the  concept  of  #USER-QUESTION-REQUEST. 
Statements  like, 

Ask  about  the  scene  before  the  concept. 

Let’s  talk  about  the  scene. 

Ask  me  about  prompts  before  asking  me  about  the  scene. 

Ask  me  about  the  structure  of  the  scene  first., 

which  are  addressed  to  when  and  which  questions  should  be  asked  are  mapped  to 
#USER-QUESTION-REQUESTs.  A #USER-QUESTION-REQUEST  Is  specified  by  three 
descriptors: 

QUESTION:  either  a question  type  (eg.,  STRUCTURE),  a question  (eg., 
(STRUCTURE  A0012))  or  a component  (eg.,  A0012). 

TIME:  either  BEFORE,  AFTER  (in  which  case  REFERENT  must  be  present) 
or  LATER  or  NOW. 


REFERENT;  takes  the  same  values  as  question. 


The  Interpreter 


116 


The  Interpretation  of  a ffUSER-QUESTION-REQUEST  Is  ask 
{one  of} 

all  qut  * ns  of  type  QUESTION 
this  pu.ti  ular  QUESTION 

any  questions  about  the  component  which  Is  QUESTION 


either  NOW  or  I ATER,  or 
BEFORE  or  AFTER  asking 

(one  of} 

all  questions  of  type  hEFERENT 

this  particular  question  which  is  REFERENT 

any  questions  about  the  component  which  is  REFERENT 


Then  If  AOOOI  points  to  the  scene,  and  A0002  to  the  concept,  we  have, 

Ask  about  the  scene  before  the  concept. 

[ffUSER-QUESTION-REQUEST  Guestion:  A0001  Time:  BEFORE  Referent:  A00d2] 

Let’s  talk  about  the  scene. 

[ffUSER-QUESTION-REQUEST  Question:  A0001  Time:  NOW] 

Ask  me  about  piompts  before  asking  me  about  the  scene. 

[ffUSER-QUESTION-REQUEST  Question:  PROMPT  Time:  BEFORE  Referent:  A0001] 

Ask  me  about  the  structure  of  the  scene  first. 

[ffUSER-QUESTION-REQUEST  Question:  (STRUCTURE  AOOOI)  Time:  NOW] 


5.2  The  knowledge  base 

The  knowledge  base  used  by  the  Interpreter  consists  of  two  declarative  blocks  of 
krowledge,  and  a set  of  programs  which  make  use  of  the  Information  In  them.  The 
programs  are  used  to  construct  the  specification,  using  the  descriptions  contained 
In  Concepts  and  Definitions,  the  two  declarative  blocks.  There  Is  no  formal 
definition  of  what  constitutes  a concept;  a concept  Is  anything  which  the 
Interpreter  can  reason  about.  Hence  there  Is  a concept  behind  every  ALGorithm 
and  DATA  type  In  the  specification,  as  well  as  several  higher  order  concepts.  A 
definition  Is  a means  of  mapping  a sequence  of  English  words  Into  a concept. 


The  Interpreter 


117 


5.2.1  Concepts 

Concepts  express  many  thinys,  but  are  oriented  towards  supplying  the  Information 
needed  to  instantiate  and  reason  about  components,  instantiation  refers  to  the 
process  of  creating  a component  and  filling  in  Its  descriptors  with  other  components 
in  the  specification,  so  that  it  too  becomes  part  of  the  specification. 


The  information  contained  in  a concept  is 

Descriptors.  What  descriptors  the  concept  can  take,  the  type 
checking  constraints  the  descriptors  must  obey,  questions  to  ask 
if  the  concept  is  presented  without  a necessary  descriptor,  and 
default  descriptor  values. 

Postconditions:  what  is  true  after  the  concept  has  been 
executed 

Side  effects:  what  changes  to  make  to  the  program  specification 
when  the  concept  has  been  recognized 


For  an  example,  consider  the  concept  #MAP.  #MAP  represents  the  primitive 
operation  In  the  specification  which  allows  the  user  to  associate  one  data  with 
another.  Figure  5.5  contains  the  #MAP  concept. 


The  Interpreter 


118 


//MAP 

DESCRIPTORS: 

STEPOP 

CHECK 1 : ISA  //ALG 

QUESTIONS:  Uhere  doe9  the  #MAP  belong? 

ARG1 

CHECK 1:  ISA  //MAPPING 

ARG2 

CHECK! : ISA  //DATA 

CHECK2:  //MAP-CHECK2  (ARG1  ARG2  ARG3) 

QUESTIONS:  Uhat  is  being  //MAPped? 

ARG3 

CHECK  1 : ISA  //DATA 

CHECK2:  M1AP-CHECK2 (ARGl  ARG2  ARG3) 

QUESTIONS:  Uhat  >9  ARG2  being  //HAPped  to? 

POST-CONDITIONS:  (//EQUAL  (//] MAP  ARGl  ARG2)  ARG3) 

SIDE-EFFECTS:  MAPPING-UPDATE (ARGl  ARG2  ARG3) 


Figure  5.5 
The  aMAP  concept 


Figure  5.5  shows  that  a //MAP  is  specified  by  four  descriptors.  Each  descriptor  has 
information  associated  with  it  which  assists  the  interpreter  In  filling  In  the 
descriptor  slot.  For  Instance,  ARG2  must  be  a DATA  component  (//DATA  refers  to  the 
concept  of  a DATA  component).  The  second  check  provides  a more  contextual  type 
checking  which  is  used  during  matching  and  the  parser/interpreter  interface.  Since 
tiie  check  is  more  complicated  than  a simple  type  check  (eg.,  ISA  //DATA),  a program 
(MAP-CHECK2)  Is  called  which  returns  True  or  False,  depending  on  whether  ARGl  Is 
a MAPPING  which  maps  components  of  type  ARG2  into  ARG3.  il  a //MAP  Is  to  be 
instantiated  and  ARG2  is  not  present,  then  the  question  "What  is  the  second 
argument  of  the  map?",  represented  by  (ARG2  A0001)  where  A0001  points  to  the 


The  Interpreter 


119 


Instantiated  MAP,  Is  asked.  SIDE-EFFECTS  consists  of  things  which  should  be  done 
whenever  a component  Is  Instantiated.  In  the  case  of  #MAP,  SIDE-EFFECTS 


consists  of  a program  (MAPPING-UPDATE)  which  updates  the  range  and  domain  of 
APG1  If  necessary.  The  POST-CONDITIONS  are  what  Is  true  after  the  concept  has 
been  executed.  Section  6.3  on  matching  explains  how  the  POST-CONDITIONS  and 
CHECKS  are  used. 

Figure  5.6  shows  the  Interpreter’ s concept  of  #DATA  and  ASET. 


The  concepts  In  Figure  6.6  both  have  Information  about  prepositional  modifiers. 
Such  Information  Is  usually  associated  with  Individual  word  definitions,  but  when  the 


ADATA 

DESCRIPTORS! 

INITIAL-VALUE 

CHECK  1:  ISA  ADATA 

QUESTIONS:  Uh3t  is  the  initial  value  of  the  DATA? 
VALUE 

CHECK1 : ISA  ADATA 

PREPOSITIONS: 

WITH 

CHECK 1:  ISA  ADATA 

MEANING:  (//ASSOCIATE  data  object) 

IN 

CHECK  1 : ISA  ASET 

MEANING:  (AMEMBER  data  object) 


ASET 

DESCRIPTORS: 

ELEMENT 

CHECK!:  ISA  ADATA 

DEEaULT:  instantiation  of  a DATA  whose  REP  is  ELEMENT. 

SIZE 

CHECK 1 : ISA  AINTEGER 

CLASSIFIERS:  ELEMENT 
PREPOSITIONS: 

OF 

CHECK 1 : GENERIC-ELEMENTO 

MEANING:  ELEMENT 


Figure  5.6 

The  *DATA  and  »SET  concepts 


i]IMuiJUiiiUu;jjJiu:Ul^JllUliU^^ijlil!IiJjilliLll]JUUJlLdj]]llLi^l|l)d|jLIMIiJilLlO[iJI)lllltlJljllllliy)llM)^  -Hu ||||i|l.||li|j|jn||H,||||^ 


The  Interpreter 


120 


modification  Is  standard  for  the  concept,  regardless  of  how  it  is  expressed  in 
English,  the  information  Is  tied  to  the  concept  itself.  The  "in"  modification  for  #DATA 
means  tiiat  every  time  a word  which  maps  to  a #DATA  Is  modified  by  a prepositional 
phrase  whose  preposition  is  "in"  and  whose  object  is  a #SET,  the  meaning  of  the 
modification  is  that  the  component  the  word  matches  to  Is  a member  (represented 
by  the  Interpreter  concept  #MEMBER)  of  the  component  the  preposition  object 
matches  to.  The  "of"  modification  for  #SET  is  slightly  different  in  that  the  meaning 
of  the  modification  is  a descriptor  of  #SET  rather  than  a concept.  This  means  that 
the  object  of  the  preposition  fills  that  slot  In  the  #SET  description.  The  check  for 
"of"  Is  a program  which  makes  sure  that  the  preposition  object  is  a plural  noun 
which  is  a #DATA. 

The  CLASSIFIERS  siot  is  similar  to  PREPOSITIONS  in  that  it  appears  in  definitons, 
rather  than  concepts,  except  in  cases  in  which  the  meaning  of  the  classifier  is  the 
same  for  aii  nouns  mapping  to  tiie  concept.  For  #SEi,  the  CLASSIFIERS  siot  says 
that  if  a noun  modifies  a noun  mapping  to  set,  and  the  noun  satisfies  the  checks  for 
ELEMENT,  then  it  fills  the  ELEMENT  descriptor  of  the  #SET.  eg.,  in  "the  integer  iist", 
"integer"  Is  a classifier  of  "iist"  which  maps  to  #SE f.  Since  "integer"  is  a #DATA,  it 
is  assumed  to  be  the  generic  element  of  the  iist. 

To  avoid  needless  duplication  of  Information,  the  concepts  are  arranged  in  a 
refinement  tree  In  which  every  concept  shares  aii  the  information  associated  with 
its  parent  in  the  tree.  ffSET  is  a refinement  of  ffDATA.  Thus  when  checking  #SET 
for  Information,  aii  tiie  information  connected  to  #DATA  applies,  eg.,  If  A0424  has 
just  been  Instantiated  as  a set,  the  question  "What  is  the  initial  vaiue  of  A0424?" 
will  be  pending.  Of  course,  if  the  system  can  answer  the  question  (perhaps  A0424 
is  the  argument  of  an  INPUT),  it  wiii  never  be  asked  of  the  program  designer. 


The  Interpreter 


121 


Concepts  are  also  used  to  capture  regularities  In  language.  English  provides  many 

different  ways  to  express  the  same  thought.  For  example,  X Is  a function  of  Y can 

be  stated  as, 

X depends  on  Y. 

X Is  calculated  from  Y. 

X Is  determined  from  Y. 

X Is  calculated  on  the  basis  of  Y. 

X ccn  be  found  from  Y. 

X Is  based  on  Y. 

X Is  obt“!r.ed  from  Y. 

X Is  related  to  Y. 

X Is  found  by  examining  Y. 


As  an  aid  In  writing  definitions,  It  Is  useful  to  have  all  these  phrases  map  Into  a 
single  manlpulable  entity,  namely  the  concept  of  ^CALCULATION.  #CALCULATION  has 
two  descriptors,  ARG1,  which  Is  a #DATA,  and  ARG2  which  Is  #PREDICATE.  Methods 

for  using  concepts  like  #CALCULATlON  are  explained  In  the  following  section  on 
definitions. 

6.2,2  Definitions 

Definitions  are  used  to  map  from  English  words  to  concepts.  At  the  same  time,  they 
provide  the  parser  with  measure  Information  It  needs. 

The  Information  contained  In  a definition  Is, 

Concept:  What  concept  the  definition  maps  to. 

Word:  what  word  the  definition  is  a definition  of. 

Case-Descriptor  relationships:  Which  verb  cases  can  be  used  to 
fill  the  descriptor  slots  of  the  concept.  Which  cases  must  be,  or 
are  preferred  to  be,  present  for  the  definition  to  succeed. 

Prepositions:  Which  descriptors  prepositions  can  fill. 

Conjunctions:  Which  descriptors  conjunctions  can  fill. 


The  Interpreter 


122 


Defaults:  Default  values  for  some  descriptor  slots. 

Clauses:  Which  descriptors  can  be  filled  by  clauses  not 
Introduced  by  conjunctions. 


Figure  6.7  contains  an  example. 


1Z/MARK 

DEFINITION -OF:  MARK 
ISA1  //MAP 

CASES:  (SUB  STEPOFI (OBJ  ARG2  Must) (I0B  ARG3  Preferred) 
PREPOSITIONS: 

AS 

CHECK 1 : ISA  //DATA 

MEANING:  ARG3 
DEFAULTS: 

ARG1 : GET-MAPPING (MARK) 


1 //COLLECT I ON 

DEFINI TION-OF  COLLECTION 
ISA  //SET 


Figure  5.7 

A definitions  of  mark  and  collection 


Suppose  that  the  interpreter  receives  the  sentence  "Mark  the  scene 

‘necessary’".  The  parse  is 

(HARK  NN 

[SUB  YOU*] 

[OBJ  (SCENE  THE)] 

[ I0B  "necessary"] 

> 

The  definition  will  successfully  map  the  sentence  into  the  concept  If  all  the 
requirements  for  the  concept  descriptors  are  met.  Following  the  CASEs  slot,  YOU*  Is 
matched  to  an  ALGorithm  component  as  the  STEPOF  descriptor,  and  "the  scene"  and 
"necessary"  are  matched  to  #DATAs  as  the  ARG2  and  ARG3  of  the  #MAP  to  be 
instantiated.  The  "Must"  in  the  OBJ  mapping  indicates  that  the  OBJ  caso  must  be 


The  Interpreter 


123 


present  for  the  definition  to  succeed.  Similariy,  the  "Preferred"  In  the  iOB  case 


means  that  IOB  case  Is  strongly  preferred  to  be  present,  but  not  necessary.  This 


means  that  using  the  verb  "mark",  something  can  be  marked  without  specifying 


what  the  marking  is,  but  a marking  cannot  be  specified  without  mentioning  what  is 


being  marked.  ARG1  of  the  #MAP  comes  from  the  default  slot  of  the  definition;  the 


vaiue  of  a program  (GET-MAPPiNG)  which  finds  the  MAPPING  component  be  used  for 


"mark",  or  creates  one  If  this  is  the  first  instance  of  "mark"  In  the  program 


specification. 


Nouns  are  defined  similarly  to  verbs,  with  the  exception  that  the  case  Information  Is 


missing  (it  is  usua'ly  replaced  by  classifier  information).  Figure  5.7  contains  the 


Interpreter’s  definition  of  "collection". 


Figure  5.8  contains  two  definitons  which  utilize  the  #CALCULATiON  concept. 


IffCLASSI FY 

DEFINI T ION-OF : CLASSIFY 
ISA:  ACALL 

CASES:  (SUB  STEPOF) (OBJ  ARCS) 

CLAUSES: 

CHECK 1:  ISA  ^CALCULATION 

MEANING:  PROCEDURE  (extract  ARG2) 

OEFAULTS: 

RESULT:  instantiation  of  a OATA  uhose  REP  is  CLASSIFICATION. 
IffBASE 

OEFINI TION-OF:  BASE 
ISA:  ^CALCULATION 
CASES:  (OBJ  ARG1 ) 

PREPOSITIONS: 

ON 

CHECK 1 : ISA  ^PREDICATE 

MEANING:  ARG2 


Figure  5.0 

Definitions  for  "classify"  Bnd  "base" 


Consider  the  processing  of  the  sentence  "it  classifies  the  scene  based  on  whether 


I 


i 


The  Interpreter 


124 


It  fits  the  concept."  "based  on  whether  it  fits  the  concept"  is  mapped  to  a 


^CALCULATION  whose  ARG2  Is  the  predicate  "it  fits  the  concept",  it  is  also  a 


clause  which  modifies  "classify"  (anticipating  section  5.5  on  the  parser/interpreter 


interface,  we  note  that  the  reason  the  parser  knows  "based"  modifies  "classify" 


rather  than  "scene"  Is  precisely  because  one  modification  is  meaningful  (all  the 


words  ->  definitions  ->  concepts  maps  succeed)  and  the  other  is  not).  According  to 


the  definition,  a clause  can  modify  "classify1  If  it  is  a ^CALCULATION.  If  it  is,  the 


modification  instructions  are  to  fiil  the  PROCEDURE  slot  of  the  "classify"  #CALL  with 


ARG2  of  the  ^CALCULATION.  This  work  is  done  during  Formatting,  so  the  parse  for 


the  sentence  Is, 


(IMP  {CLASS ’ F Y NN 

[STEPOF  YOU*] 

(ARGS  (SCENE  THE)] 

[PROC  (FIT  NN 

[ARGS  IT] 

[ARGS  (CONCEPT  THE)] 


Had  the  sentence  been, 

Classify  the  scene  on  the  basis  of  whether  It  fits  the  concept. 
Classify  the  scene  as  a function  of  whether  It  fits  the  concept. 
Classify  the  scene  depending  on  if  it  fits  the  concept, 
etc. 

the  result  would  have  been  the  same. 


Many  times,  unknown  words  are  used  to  refer  to  undefined  predicates  or  supparts 


of  the  program  being  described.  Since  it  would  be  unreasonable  to  expect  all  words 


to  bo  Included  in  the  system,  and  often,  the  definitons  of  such  words  are  inferable 


from  context,  the  interpreter  uses  a "template"  definition  to  try  to  create  a 


definition  for  any  unknown  words  which  are  used  in  the  dialogue. 


The  Interpreter 


125 


Here  Is  an  example: 

The  program  reads  a graph  and  a node.  A graph  Is  a set  ot  pairs. 
Each  pair  consists  of  two  nodes,  which  are  primitive.  The  program 
prints  a list  of  ail  the  nodes  which  can  be  reached  from  the  Input 
node. 


When  the  interpreter  encounters  the  last  sentence,  It  has  no  information  about 
"reach"  other  than  that  it  Is  a verb.  Because  It  Is  being  used  as  the  main  verb  of  a 
clause  which  modifies  u noun,  the  Interpreter  assumes  that  it  represents  a 
predicate  which  the  program  designer  has  yet  to  define.  The  "template"  predicate 
definition  and  Its  instantiation  for  "reach"  Is  shown  In  Figure  6.9. 


PREO I CATE-TEMPLATE 
OEE I NI T I ON-OP : — 

ISA:  tfPROCEOURE 

CASES:  (SUB  ARCS) (OBJ  ARGS) 

PREPOSITIONS: 


match 

CHECK 1 : ISA  #0ATA 

MEANING:  ARGS 


1 BREACH 

DEE  I NIT  I ON -OF:  REACH 
ISA:  APRUCEOURE 
CASES:  (SUB  ARGSi (OBJ  ARGS) 
PREPOSITIONS: 

FROM 

CHECK 1 : ISA  #DATA 

MEANING:  ARGS 


Figure  5.9 

A template  definition  and  its  instantiation 


The  template  definition  maps  to  a ^PROCEDURE.  The  "match"  In  Its  PREPOSITIONS 
slot  matches  to  any  preposition  that  the  Interpreter  cannot  attach  to  anything  else. 
The  resulting  definition  of  "reach"  asserts  that  "reach"  Is  a PROCEDURE,  and  that 
the  preposition  "from"  can  be  used  to  Introduce  one  of  Its  arguments. 


The  Interpreter 


126 


6.2.3  Procedural  embedding 

Most  of  the  interpreter’s  knowledge  about  programming  Is  represented  by 
procedures.  This  information  is  necessary  In  order  to  incorporate  what  the  program 
designer  has  said  in  the  program  specification  without  asking  questions  which  the 
designer  would  feel  his  statements  have  implicitly  addressed,  it  Is  not  intended  to 
help  the  Interpreter  from  a problem  solving  (eg.,  writing  efficient  algorithms  from 
Inefficient  descriptions)  standpoint.  The  information  was  modelled  procedurally 
since  this  seemed  to  be  provide  the  easiest  way  to  encode  and  apply  it.  The 
disadvantages  of  the  procedural  approach  (primarily  opacity)  do  not  apply,  as  the 
information  encoded  in  tiie  procedures  is  not  needed  elsewhere  In  the  system. 

The  Information  Is  organized  into  several  modules  which  are  expert  In  building 
various  constructions  in  the  program  specification.  There  are  modules  which  build 
CONDs  from  a series  of  CASES,  construct  COMPUTES,  note  scoping  ambiguities,  build 
quantified  expressions  from  phrases  like  "ali  relations  in  the  concept  not  in  the 
scene...",  etc.  As  an  example,  we  wlii  consider  the  EXIT-TEST  module. 

The  EXIT-TEST  module  is  responsible  for  setting  up  the  exit  conditions  of  loops,  its 
arguments  are  tiie  loop  and  the  phrase  which  indicates  the  exit  condition.  The 
method  for  building  a loop  from  each  of  the  phrases  It  knows  about  is  simpiy 
programmed  out.  Here  is  an  example. 


Figure  6.10  contains  a fragment  of  a program  specification. 


The  Interpreter 


127 


* type  LOOP 
steps 


* type  INPUT 
args  * 


* type  CALL 
procedure  *♦ 
args  * * 


* type  SET 
element  *♦ 


EXIT-TEST  receives  [#INPUT  (ARGS  "Quit")]  and  the  LOOP  as  Its  Input.  When  the 
phrase  Is  an  #INPUT  concept,  EXIT-TEST  finds  an  INPUT  In  the  loop  and  places  a 
test  for  the  ARGS  of  the  #INPUT  concept  after  It.  The  result  Is  shown  In  Figure 
5. 1 1 . 


The  Interpreter 


128 


The  exit  test  building  program  has  added  four  new  components:  the  CASE 
component  which  1.3  the  exit  test,  an  EQUAL  component  which  Is  the  condition  of  the 
exit  test,  and  a STRING  and  ALTERNATIVE  component.  The  ALTERNATIVE  component, 
which  replaced  the  SET  as  the  argument  to  the  INPUT,  reflects  the  fact  the 
arguments  to  the  INPUT  may  now  be  either  the  SET  or  a STRING  whose  value  Is 
"Quit".  The  ALTERNATIVE  has  been  installed  as  one  of  the  arguments  of  the  exit 
test,  while  the  SET  remains  as  one  of  the  arguments  to  the  CALL  following  the  test. 


The  Interpreter 


129 


5.3  The  processing  cycle 

The  processing  cycle  refers  to  the  sequence  of  actions  taken  by  the  Interpreter 
during  the  processing  of  a user  reply.  The  cycle  begins  with  the  receipt  of  a 
question  and  user  reply  from  the  PSI  dialogue  module.  The  reply  may  be  a phrase  or 
any  number  of  sentences.  The  question  typically  consists  of  a descriptor  slot  and  a 
component  (the  question  object)  which  Is  missing  Information  for  the  slot,  (eg., 
(ARGS  X)  means  "What  are  the  arguments  for  X". 

The  first  action  taken  by  the  Interpreter  Is  to  update  the  Focus  to  the  object  of  the 
question.  Section  5.4  explains  the  use  of  the  Focus  and  its  companion,  the  Dafa 
Focus. 

Then  each  sentence  In  the  reply  Is  parsed  and  the  result  is  analyzed.  The  analysis 
consists  of  determining  which  concepts  the  sentence  invokes,  finding  (or  creating) 
components  to  fill  In  the  descriptor  slots  of  these  concepts,  and  instantiating  the 
concepts  found  Into  components  In  the  program  specification.  Analysis  has  several 
side  effects  besides  the  building  of  the  specification. 

Throughout  analysis,  the  Focus  and  Data  Focus  are  constantly  updated  to  reflect 
the  components  the  program  designer  Is  talking  about. 

Anothet  important  side  effect  is  the  questions  are  posed  by  the  Instantiation  of 

Incomplete  concepts.  For  Instance,  the  reply, 

"It  reads  a scene,  tests  whether  It  fils  the  concept,  verifies  the 
result  of  this  test  with  the  user,  and  updates  the  concept.  Then 
it  repeats  the  process." 

causes  the  questions, 


What  Is  the  structure  of  the  scene? 


The  Interpreter 


130 


What  Is  the  structure  of  the  concept? 

What  Is  the  Initial  value  of  the  concept? 

Describe  verifying  the  test  result. 

Describe  updating  the  concept. 

Describe  the  test  of  whether  the  scene  fits  the  concept? 

What  Is  the  exit  test  of  the  loop? 

to  be  placed  *he  question  queue. 

The  Instantiation  u.  an  Incomplete  concept  may  also  lead  to  a job  being  put  on  the 

background  job  queue.  The  background  job  queue  consists  of  questions  which  the 

Interpreter  cannot  answer  immediately,  but  expects  to  be  able  to  answer  after 

more  Information  has  come  ;n.  If  the  Information  never  arrives,  the  Interpreter 

assumes  that  the  program  designer  was  leaving  the  implementation  to  the  PSI 

coding  modi  ie.  These  questions  are  placed  on  the  background  job  queue  (rather 

than  the  question  queue)  queue  to  ensure  that  they  will  never  be  asked  of  the 

user.  The  background  Job  queue  is  Implemented  as  a list  of  procedures  and  their 

arguments,  which  are  run  at  the  end  of  every  processing  cycle.  Those  that 

succeed  In  answering  theli  questions  are  removed  from  the  cycle.  An  example  of  a 

background  Is  the  one  associated  with  the  #ASSOCiATE  concept.  ^ASSOCIATE  Is 

used  by  the  Interpreter  as  an  intermediate  representation  of  the  fact  that  two 

Ti'TAs  are  somehow  being  associated.  For  instance,  In 

"Cookbook  reads  a recipe  list,  and  then  repeatedly  reads  a name 
and  prints  the  recipe  with  that  name" 

'v  ith  that  n'uuo"  maps  Into  an  ^ASSOCIATE  whose  args  are  "the  recipe"  and  "the 
na.ie".  At  this  point,  there  is  no  way  to  tell  how  the  program  designer  expects 
"names"  and  "recipes"  to  be  associated,  so  a background  job  is  set  up.  A 
background  job  Is  used  rather  than  a question  since  If  an  answer  Is  never  found, 


The  Interpreter 


131 


the  PSI  coder  will  be  able  to  choose  an  efficient  Implementation,  and  In  fact,  the 
user  may  be  too  unsophisticated  to  answer  such  a question.  The  background  job 
remains  active  until  the  program  designer  says, 

"A  recipe  has  a name,  an  l-gredienHist,  and  directions." 

This  defines  "recipe"  as  record  structure  vlth  three  fields,  one  of  which  is  a name. 

One  of  the  situations  the  #ASSOCIATE  background  job  knows  how  to  resolve  Is  the 

case  where  one  of  the  associated  DATAs  is  a field  of  the  other,  it  changes  THE 

#ASSOCiATE  assertion  from 

[ASSOCIATE  argl : A1  arg2:  A2] 
to 

[EQUAL  args:  ([FETCH  arg1:A1  label:  NAME]  A2)] 

where  A1  and  A2  point  to  the  recipe  and  name,  and  FETCH  is  the  interpreter 
primitive  which  gets  the  DATA  of  the  label  FIELD  of  its  ARG1. 

When  each  sentence  In  the  program  designer’s  reply  has  been  analyzed,  the 
background  Jobs  are  run  and  the  question  list  is  examined  to  see  if  any  of  the 
questions  have  been  answered  by  subsequent  analysis.  The  revised  question  list  is 
sent  to  the  PSi  dialogue  module,  which  selects  a question,  gets  a reply  from  the 
program  designer,  and  gives  the  question  chosen  and  the  dejigner’s  response  to 
the  interpreter  to  start  another  cycle. 


The  Interpreter 


132 


5.4  Matching 

This  section  is  concerned  with  the  identification  of  English  noun  phrases,  which 
occurs  during  the  filling  in  of  a concept's  descriptor  slots,  and  consists  of  finding 
the  component,  or  creating  the  component  if  none  exists,  which  is  the  contents  of 
the  descriptor  slot  being  filled,  based  on  the  English  presentation  of  the  component 
(eg.,  the  noun  phrase). 

The  system's  handling  of  pronouns  and  nouns  is  virtually  the  same.  The  only 
difference  lies  in  the  possible  match  set.  A pronoun  may  match  any  component  in 
the  specification  which  has  been  mentioned  and  meets  the  syntactic  requirements 
(eg.,  plural,  animate  etc.)  of  the  pronoun.  A noun  may  match  any  component  In  the 
specification  which  has  been  referred  to  in  the  same  (or  a synonymous)  way.  The 
key  to  the  matching  process  is  the  context  supplied  by  the  concept  whose  slot  is 
being  filled. 

5.4.1  Nouns 

The  first  time  a noun  is  used,  the  system  creates  a component  which  is  indexed 
under  the  noun’s  definition.  Thus,  "It  reads  in  a scene."  would  cause  the 
component: 

A1 

ciass  DATA 
rep  Iff  SCENE 

to  be  created,  where  1 Af SCENE  is  a definition  the  Interpreter  creates  for  "scene". 
Iff  SCENE  is  assumed  to  be  a ffDAlA  so  that  it  satisfies  the  type  constraints  of  the 
ARGS  of  an  ffiNPUT.  Associated  with  IffSCENE  is  the  fact  that  A1  is  an 


Instantiation  of  "scene".  The  situation  we  have  outlined  leads  to  the  simplest  kind 


The  Interpreter 


133 


of  matching.  If  the  user  says,  "Print  the  scene.",  "the  scene"  Is  matched  to  A1 
because  the  "the"  Implies  that  the  referent  should  be  found  in  the  specification,  A1 
Is  the  only  Instantiation  of  "scene"  In  the  specification,  and  It  satisfies  the  type 
constraints  of  the  ARGS  of  #OUTPUT. 


Now  consider  a slightly  more  complicated  situation.  Suppose  we  have  scenes  and 
concepts,  each  of  which  are  sets  of  relations.  Further,  the  relations  In  the  concept 
are  marked  either  "possible"  or  "necessary".  Figure  5.12  shows  how  this  would  be 
represented  In  the  program  specification. 


A1 

class  DATA 
type  SET 
rep  CONCEPT 
e I emen  t * 


A3 

class  DATA 
type  SET 
rep  SCENE 
element  * 


-**  A2 

c I ass  DATA 
rep  RELATION 
assertions  * 


c I ass  ALG 
type  EQUAL 
args  * ** 


class  ALG 
type  IMAP 

argl  *♦ ! 

arg2  * 

I 


AS 

class  DATA 
type  MAPPING 
name  MARK 


A4 

class  DATA 
rep  RELATION 


-**  class  DATA 

type  ALTERNATIVE 
al ternat i ves  * * 

It 


class  DATA 
type  STRING 
value  "possibl 


class  DATA 
type  STRING 
value  "necessary" 


Figure  5.12 

Scenes,  concepts  and  relations 


The  user  says,  "Print  the  relations  In  the  concept  which  are  marked  ‘possible’"., 


which  Is  parsed  to, 


The  Interpreter 


134 


< HtPRINT  NN 

[STEPOF  YOU*] 

[Al'.GS  (RELATION  IPL  THE  (IN  (CONCEPT  THE)) 
flHMARK  PN 

[ARG2  !match_to_head_noun] 

[ ARG3  "possible"] 

>)] 

) 

The  Interpreter  must  find  (or  create)  a component  which  can  he  used  as  the  ARGS 
of  the  ^OUTPUT  1#PRINT  maps  to.  If  the  noun  group  were  simply  "the  relations", 
the  Interpreter  would  match  it  to  A1  or  A3,  whichever  was  mentioned  last.  But  in 
tills  case,  there  are  modifiers  which  will  presumably  narrow  down  the  choice. 

The  first  modifier  Is  the  prepositional  phrase  "in  the  concept".  The  #DATA  concept 
(Figure  5.6)  is  used  to  determine  the  meaning  of  the  modification,  it  is  (#MEMBER 
A6  A 1 ) where  "the  concept1  has  been  matched  to  A1  and  A6  is  being  used  to 
represent  the  DATA  which  wiii  be  the  final  answer  to  the  match.  ^MEMBER  Is 
treated  as  a special  case  in  the  matching  process.  The  first  #MEMBER  In  the 
modifier  list  which  is  not  negated3,  and  whose  ARG1  Is  the  noun  In  question,  Is 
transformed  to  the  descriptor-siot/vaiue  pair  of  (ELEMENTOF  X)  where  X is  the 
ARG2  of  the  #MEMBER.  Go  in  this  case,  the  ^MEMBER  is  resolved  to  (ELEMENTOF 
A1).  Following  the  ELEMENT  slot  of  A1  leads  to  A2  which  becomes  the  only  match 
possibility.  If  there  were  no  more  modifiers,  the  match  process  would  return  A2  as 
the  "relations  in  the  concept". 

The  next  modifier  Is  a #MAP.  The  post  condition  of  #MAP  (Figure  5.5)  is  filled  In 
with  the  #MAP  descriptors,  yielding,  (ffEQUAL  (ffiMAP  A5  A6)  "possible"),  if  this  did 
not  contradict  the  assertion  list  of  A2,  then  A2  would  be  returned  as  *he  meaning  of 


3 In  "The  relations  which  are  not  in  the  concept",  the  meaning  of  the  prepositional 
modification  Is  (#NOT  (^MEMBER  A6  A1 )),  which  is  inserted  in  the  assertions  list. 


The  Interpreter 


136 


the  noun  phrase,  It  does,  though,  since  the  the  assertion  list  of  A2  asserts  that  a 
relation  In  the  concept  may  be  marked  either  "possible"  or  "necessary",  Therefore 
a new  component  must  be  created,  one  which  Is  the  generic  element  of  a subset  of 
A1  which  consists  of  all  relations  marked  "possible".  This  Is  accomplished  via  the 
SUBSET  module,  which  Is  another  example  of  a small  bit  of  knowledge  being  bound 
up  In  a procedure.  The  SUBSET  module  takes  a set  and  an  assertion  list  and 
creates  a COMPUTE  component  which  builds  the  subset.  The  COMPUTE  created  Is 
shown  In  Figure  5.1 2 


class  ALG 
type  COHPUTE 
quant i fy  ALL 
on  A1 

resu't  *« A7 

assertions  * class  DATA  

type  SET 

element  * ►*  AG 

class  DATA 
rep  RELATION 
asser  t i ons  * 

I 

►*  class  ALG 

type  EQUAL 

args  * *♦———♦#  class  ALG 

t type  IMAP 

argl  A5 

| ' arg2  *« 

* class  DATA 
type  STRING 
value  "possible" 


Figure  5.12 

The  COMPUTE  for  "The  relatione  in  the  concept  marked  'poeeible'." 


A6  Is  the  result  of  the  matching  process.  The  COMPUTE  Is  Inserted  Into  the  program 
specification  when  the  "print"  OUTPUT  component  Is. 


The  Interpreter 


136 


5.4.2  Pronouns 

As  we  have  indicated,  the  difference  between  pronoun  reference  and  noun 
reference  Is  In  the  possible  match  set.  The  Interpreter  keeps  track  of  two  special 
compo  lents,  the  Focus  and  Data  Focus , which  are  used  to  help  reduce  the  number 
of  pronoun  match  possibilities. 

When  the  program  designer  begins  his  reply,  the  Focus  refers  to  the  object  of  the 
question.  During  the  processing  of  the  program  designer’s  reply,  the  Focus 
changes,  so  that  It  always  points  to  the  last  component  modified  by  the  interpreter. 
We  are  making  a distinction  between  ''modifying"  and  "creating"  a component.  For 
example,  the  phrase,  "It  tests  the  concept",  wilt  cause  a CALL  component  to  be 
created  with  ARGS  "concept";  we  do  not  consider  the  CALL  component  to  have 
been  modified  until  some  of  its  other  descriptors  (eg.,  PROCEDURE)  have  been  filled. 
The  Data  Focus  Is  the  last  DATA  component  which  has  been  modified,  described  as  a 
part  of  another  DATA,  or  used  as  the  ARGS  01  ARG1  of  an  ALGorithm  component.  The 
ruies  for  the  Focus  and  the  Data  Focus  have  been  selected  so  that  they  are  the 
most  likely  referents  for  any  pronouns  used  by  the  program  designer.  Of  course, 
they  still  must  satisfy  the  requirements  of  the  descriptor  they  are  being  proposed 
for.  If  they  don’t,  the  interpreter  falls  back  on  searching  for  a referent  from  the 
pronoun  reference  list,  which  is  a list  of  each  component  that  has  been  mentioned 
by  the  program  designer. 

We  can  see  how  this  works  on  the  following  questlon/reply  pair; 

PSi:  Describe  the  program. 

USER;  It  reads  a scene,  tests  whether  it  fits  the  concept, 
verifies  the  result  of  this  test  with  the  user,  and 
updates  the  concept.  Then  it  repeats  the  process. 


The  Interpreter 


137 


The  question  sets  the  Focus  to  "program".  The  first  "It"  Is  matched  to  the  Focus 
since  "Input"  requires  that  It’s  SUB  be  an  ALGorlthm.  The  Data  Focus  Is  set  to  the 
"scene"  because  "scen^"  Is  the  ARGS  of  the  most  recently  created  ALGorlthm 
component  (the  INPUT).  The  second  "It"  is  matched  to  the  Data  Focus,  since  the 
Focus  Is  not  a DATA  (as  Is  required  by  the  ARGS  of  "fit").  The  third  "It"  Is  matched 
to  the  Focus,  since  the  STEPOF  of  "repeat"  must  be  an  ALGorlthm.  Note  that  none 
of  "test",  "verify",  or  "update"  were  proposed  as  referents  for  the  third  "It",  even 
though  they  are  all  ALGorlthm  components,  if  there  is  no  reason  not  to  use  the 
Focus  or  Data  Focus  as  the  referent,  no  other  possibilities  are  checked. 

When  the  Data  Focus  and  the  Focus  both  refer  to  DATAs,  the  preference  checks 
given  In  the  concepts  are  used  to  choose  from  between  the  two.  Consider  the 
dialogue  fragment  below 

The  two  major  data  structures  In  the  program  are  the 
concept  and  the  scene.  The  concept  Is  a set,  which  is  read 
at  the  start  of  the  program.  The  scene  has  two  parts.  The 
first  part  Is  a name.  The  second  part  Is  a list. 

1.  It  should  be  read  In  after  the  concept. 

2.  It  consists  of  three  elements. 

Either  sentence  i.  or  2.  can  logically  ioIIow  the  preceding  paragraph,  yet  the  "It"  in 
1.  refers  to  the  "scene",  which  Is  the  Focus,  and  the  "It"  In  2.  refers  to  the  "list", 
which  Is  the  Data  Focus.  In  1.,  the  choice  between  the  two  Is  resolved  by  the 
CHECK2  of  #INPUT.  The  check  prefers  that  the  ARGS  of  #INPUT  should  not  be  parts 
of  other  components,  or  ARGS  of  an  already  Instantiated  #INPUT.  Since  the  "list"  Is 
part  of  the  scene,  the  "scene"  is  preferred  as  the  referent.  A similar  process  is 
used  to  find  "list"  as  the  proper  match  In  2.  The  definition  of  "consists"  that 
succeeds  Is  one  that  assigns  the  structure  of  the  OBJ  to  the  SUB.  Naturally,  It 


The  Interpreter 


138 


prefers  that  Its  SU8  have  either  no  structure,  or  a structure  which  does  not  conflict 
with  the  OBJ.  Since  "scene"  Is  known  to  be  a RECORD  with  two  fields,  "list"  Is 
preferred  for  the  match. 

The  methods  we  use  for  resolving  reference  amount  to  a heuristic  filtering  of 
possible  referents  (the  Focus  and  Data  Focus)  followed  by  type  checking  on  the 
surviving  candidates.  It  works  because  the  objects  in  our  domain  are  easily 
classifiable,  as  are  the  effects  (represented  by  which  slots  the  objects  have  filled) 
of  various  actions  upon  them.  Furthermore,  the  fact  the  we  are  talking  about 
programming  severely  limits  the  different  number  of  contexts  things  can  be  said  In, 
which  means  that  the  preference  checks  associated  with  each  component  are  likely 
to  be  consistently  correct.  Also,  a conscientious  program  designer  will  probably  find 
himself  not  using  pronouns  when  he  is  intentionally  violating  these  preferences.  For 
instance,  if  one  really  wanted  to  write  a program  In  which  the  "it"  In  1.  referred  to 
the  "name",  he  would  find  himself  saying,  "The  name  should  be  Input  after  the 
concept". 

For  difficult  reference  problems,  the  Interpreter  relies  on  the  power  of  the 
situational  checks  associated  with  each  concept’s  descriptors.  Section  1.6.2 
provided  an  example  of  their  use  in  noun  reference.  In  some  respects,  the 
situational  checks  are  equivalent  to  methods  proposed  In  other  systems.  [Hobbs 
77]  presents  a system  In  which  some  pronoun  reference  Is  achieved  by  "detecting 
Intersentence  relations".  One  such  relation  Is, 

A sentence  asserts  a change,  and  the  following  sentence 

presupposes  the  final  state  of  that  change. 


When  there  Is  a reference  problem,  It  Is  resolved  In  a way  which  realizes  an 


The  Interpreter 


139 


Intersentence  relation.  The  relation  above  helps  match  the  "It"  In  1.,  2.  and  3. 
below, 

1 . Decrease  N by  1 . If  It  Is  0,  reset  It  to  MAX. 

2.  Decrease  N by  J.  If  It  Is  0,  reset  It  to  MAX. 

3.  Subtract  J from  N.  If  It  has  thereby  gone  down  to  0,  reset  It  to  MAX., 

since  N was  changed  In  the  first  sentence  and  the  second  sentence  has  assumed 
(via  the  "if")  the  final  state  of  "It".  If  "It"  Is  matched  to  "N",  the  pattern  holds,  If  It 
Is  matched  to  either  "1"  or  "J",  It  does  not. 

The  Interpreter  achieves  the  same  effect  by  associating  a situational  check  with 
the  ARGS  of  #EQUAL  which  prefers  that  one  of  the  ARGS  be  a variable  whose  value 
has  been  changed.  Advocating  such  rules  lays  one  open  to  charges  of  "ad 
hockery",  but  the  situational  checks  are  used  for  both  noun  and  pronoun  reference, 
as  well  as  the  parser/interpreter  Interface.  When  an  Individual  check  seems 
obscure,  it  Is  only  because  It  reflects  something  which  people  rarely  think 
consciously  about.  It  is  true,  of  course,  that  the  situational  checks  currently 
associated  with  each  concept  are  not  now  complete  enough  to  handle  all  the 
reference  problems  one  might  encounter,  However,  the  system’s  heuristics  enable 
It  to  cope  nicely  with  reference  problems  it  must  handle  without  complete 
Information.  For  Instance,  even  though  the  three  sentences  from  [Hobbs  77]  were 
chosen  to  break  the  usual  pronoun  heuristics  (the  first  Introduces  the  problem,  the 
second  refutes  the  "0  shouldn’t  equal  1"  method,  and  the  third  disproves  the 
"positional"  hypothesis),  the  Interpreter  would  have  found  the  correct  referent  In 
each  case  with  the  #EQUAl  situational  check  omitted.  The  Data  focus  In  all  three 
sentences  Is  "N",  since  It  Is  the  ARG1  of  the  most  recently  created  component  (the 


The  Interpreter 


140 


SUBTRACT),  and  In  the  absence  of  any  other  information,  it  wouid  be  chosen  as  the 
referent  of  "It”. 

5.4,3  Matching  to  Implicitly  mentioned  components 

Often,  the  interpreter  will  have  to  match  to  a component  which  has  been  implicitly 
mentioned  by  the  user.  A simple  example  of  this  can  be  seen  iri  the  phrase, 

"...classify  t!  e > "ere  and  print  the  resuit." 

"Resuit"  refers  to  the  result  of  the  classification.  The  methods  described  above 
would  slmpiy  look  for  a component  indexed  by  resuit,  and  not  finding  one,  would 
create  a new  component  as  the  resuit  of  the  match.  The  solution  Is  to  do  a little 
preprocessing  before  the  matching  process  begins.  Whenever  a component  Is 
created  which  has  a result,  (in  the  example  sentence,  the  CALL  component  created 
by  "classify")  a DATA  component  Is  instantiated,  and  then  indexed  through  "result" 
and  Its  synonyms,  as  well  as  any  default  indexing  set  up  by  the  verb's  definition 
(eg.,  "classification'1  for  "classify",  as  shown  in  figure  5.8) 

A more  subtle  example  occurs  during  proposed  interchanges  between  the  desired 
program  and  its  user.  Consider  what  might  follow  the  sentence, 

"i’ii  request  a story  by  typing  a key  word". 

The  program  designer  might  say  nothing,  in  which  case  the  system  should  esk  how 
the  request  should  be  answered.  Or,  the  user  might  follow  immediately  with  a 
description  of  how  the  request  should  be  handled.  And  finaiiy,  the  user  might  just 
say  what  the  "reply"  should  be.  in  that  case,  It  is  up  to  the  system  to  realize  that 
"reply"  refers  to  the  answering  process,  and  that  the  "repiy"  shouid  be  printed  out. 


The  Interpreter 


141 


Verbs  which  Imply  an  Interchange  of  data  between  the  program  (eg.,  ask,  request, 
answer,  etc.)  are  mapped  Into  INTERCHANGE  concepts.  INTERCHANGE  concepts 
are  represented  In  the  specification  by  a SEQ  with  the  appropriate  steps.  The  SEQ 
is  set  up  by  a procedure  associated  with  #INTERCHANGE.  When  the  program  Is 
asking  something  of  the  user,  the  procedure’s  execution  results  In  a SEQ  whose 
first  step  Is  an  OUTPUT  component.  A data  Is  created  which  is  indexed  to  "reply" 
(and  "reply"  synonyms)  and  a background  Job  Is  set  up  to  complete  the  SEQ  if  tho 
user  says  nothing  further.  Completing  the  SEQ  consists  of  setting  up  an  INPUT 
component  whose  ARGS  Is  the  "reply"  data  set  up  by  the  #INTERCHANGE  procedure. 
If  the  program  Is  responding  to  a user  query,  the  ^INTERCHANGE  procedure  sets  up 
a SEQ  whose  first  step  Is  an  INPUT  along  with  a "reply"  DATA.  A slightly  different 
background  program  Is  used,  however,  which  sets  up  a SEQ  which  takes  care  of  the 
processing  required  to  answer  the  user’s  query  The  ^INTERCHANGE  background 
job  does  nothing  If  the  "reply"  data  has  been  used  as  the  ARGS  of  a last  INPUT  or 
OUTPUT  of  the  INTERCHANGE  SEQ.  This  machirery  allows  the  Interpreter  to  handle 
the  following  examples: 

"Output  the  result  of  the  test,  ask  tho  user  if  this  is  correct,  and 

read  In  the  user's  response." 

In  this  example,  the  designer  has  followed  the  #INTERCHANGE  ("ask")  with  a 
description  of  the  remainder  of  the  ^INTERCHANGE.  "Response"  matches  to  the 
"reply"  DATA  set  up  by  the  ^INTERCHANGE  procedure  and  the  dialogue  continues. 
The  ^INTERCHANGE  background  does  nothing  s nee  the  "reply"  data  Is  in  the  ARGS 
of  an  INPUT  (the  "read"),  if  the  user  had  said  only,  "...and  ask  the  user  If  this 
correct.",  the  background  job  would  have  been  called  to  create  an  INPUT  with  the 


reply"  DATA  as  ARGS. 


The  Interpreter 


142 


An  example  of  a user  Initiated  #INTERCHANGE  Is, 

PSI:  Describe  the  program. 

USER:  It  has  a data  base  of  news  stories.  Each  story  has  a set 
of  key  words  associated  with  it.  I’ll  request  a story  by  giving  a 
key  word.  The  response  should  be  all  the  stories  with  that  key 
word. 

"Request"  sets  up  an  ^INTERCHANGE.  "Response"  Is  matched  to  the  "reply"  DATA 
and  the  background  program  sets  up  an  OUTPUT  to  print  the  "response"  (as  defined 
by  the  program  designer)  to  the  user. 

6,4.4  Coercion 

The  type  restrictions  implemented  In  the  definitions  and  concepts  are  too  strict  to 
account  for  casual  language  usage.  People  often  refer  to  an  object  by  one  of  its 
parts,  to  a part  of  an  object  by  the  entire  object,  to  an  attribute  of  an  object  by 
tiie  object,  etc.  The  interpreter  must  be  able  to  "coerce"  the  component  the  user 
lias  specified  into  tiie  one  he  reaiiy  meant,  eg.,  the  one  which  satisfies  the  type 
constraints  of  the  descriptor  slot  being  Hied. 

For  Instance,  suppose  the  user  defines  a graph  as  "a  set  of  nodes  and  a mapping 
which  maps  a pair  of  nodes  into  an  edge."  The  Interpreter  assumes  that  a graph  Is 
a record  with  two  fields,  a set  and  a mapping.  Then  If  the  user  mentions  "the  nodes 
In  the  graph",  the  lnterpreter,  If  using  a strict  Interpretation  of  type  restrictions,  will 
fall  to  understand,  since  the  meaning  of  "In"  leading  to  ffMEMBER  requires  that  its 
object  be  a #SET,  This  Is  just  a specific  case  of  the  more  general  "If  X is  a record 
and  falls  to  satisfy  a type  check,  the  speaker  may  have  Intended  one  of  the  fields 


of  X 


The  Interpreter 


143 


The  Interpreter’s  type  checking  Is  Implemented  through  the  function  ISA  and  the 
more  complex  secondary  checks.  ISA  returns  False  If  Its  object  falls  to  satisfy  the 
check,  and  a component  If  the  object  satisfies  the  check.  The  component  may  be 
the  original  object,  or,  If  the  object  falls  to  satisfy  the  type  but  can  be  coerced  Into 
it,  the  component  resulting  from  the  coercion.  Thus  If  (ISA  X #SET)  is  evaluated  and 
X Is  record  structure  with  a field  whose  DATA  Is  the  set  Y,  then  the  result  of  the 
evaluation  will  be  Y and  Y will  be  used  to  fill  the  descriptor  slot. 

This  type  of  matching  allows  the  Interpreter' s matching  rules  to  be  written  with  a 

great  deal  of  flexibility.  In  section  1.5.2,  we  used, 

"It  reads  In  a trial-item,  matches  the  input  to  the  Internal  concept 
model,  and  prints  the  result  of  the  match." 

to  Illustrate  how  input  Is  matched  to  "trial-item"  rather  than  "the  read  "Input" 
operation"  because  of  the  requirement  that  the  ARGS  of  "match"  be  a #DATA.  It  Is 
actually  Implemented  through  the  coercion  feature.  In  the  absence  of  a component 
being  explicitly  referred  to  as  an  "input",  the  matching  process  looks  for  an  #INPUT 
operation.  When  an  INPUT  Is  found,  and  Is  required  to  be  a #DATA,  ISA  returns  the 
ARGS  of  the  INPUT. 

5.6  The  Reader/Interpreter  Interface 

The  Reader  function  format  is  the  Interface  between  Reader  and  the  Interpreter. 
Section  4.1  listed  the  criteria  used  by  Format  to  supply  each  parse  structure  with  a 
measure.  Reader  uses  the  measures  to  choose  from  among  competing  parse 
structures.  The  information  required  for  measuring  Is, 

1.  Does  the  verb  have  all  Its  required  cases? 

2.  Are  the  case  contents  of  the  verb  understandable? 


The  Interpreter 


144 


3.  Do  the  case  contents  satisfy  the  case  requirements? 

The  Interpreter  supplies  tue  measure  information  through  Its  concepts  and 
derinitlons.  Whether  a verb  has  all  Its  coses  can  be  r id  directly  from  the 
definition.  If  it  Is  missing  cases  the  definition  has  marked  "Must",  the  rating  is 
unacceptable.  If  It  has  all  the  Must  cases,  but  is  missing  cases  marked  "Prefered", 
the  rat.iig  is  acceptable.  Otherwise  it  is  perfect. 

Determining  whether  ne  case  contents  are  understandable  consist  of  checking 
tha  ‘he  meaning  of  all  modifications  in  the  case  contents  are  covered  by  definitons. 
if  they  are  not  all  covered  the  rating  is  unacceptabie.  If  they  are  covered,  but  not 
all  contextual  checks  in  the  relevant  definitons  are  satisfied,  the  rating  Is 
acceptable.  Otherwise  It  is  ported. 

Checking  that  the  case  contents  of  a verb  satisfy  the  verb’s  case  requirements 
makes  use  of  the  descriptor  checks  in  the  concept  the  verb  is  being  mapped  to.  If 
the  case  satisfies  the  first  check  it  Is  acceptable.  If  it  satisfies  the  the  second 
check,  then  it  is  perfect.  Otherwise,  the  case  is  unacceptabie. 

The  remainder  of  this  section  consists  of  three  examples  illustrating  how  the  three 
different  measure  parts  are  used  to  affect  the  parsing  process. 

In  the  sentence  "The  program  stores  and  retrieves  data.",  "data"  r'.ouid  be  viewed 
as  the  object  of  "stor-"  as  well  as  "retrieves".  As  we  noted  in  4.3.3,  .,js  depends 
on  the  meanings  of  "store"  and  "data",  and  Is  not  true  for  ail  sentences  with  this 
sy..cax.  The  parser  decides  whether  to  use  "data"  as  the  OBJ  of  "store" 
depending  on  which  is  better,  the  measure  of  "The  program  stores",  or  the  measure 


The  Interpreter 


145 


of  "The  program  stores  data."  The  measure  of  the  latter  is  better  since  the 
definition  of  "store"  states  that  the  OBJ  case  Is  preferred,  and  "data"  does  not 
violate  the  case  preferences  of  "store", 


For  an  example  of  the  case  preferences  at  work,  consider  the  sentence,  "If  the 
scene  fit  and  the  user  said  the  guess  was  ‘correct,  then  every....".  The  clause 


Introduced  by  "If"  has  two  syntactic  readings,  namely 

[ IF  (CONJ  AND  or  [IF  {SAY  PN 

[SUB  AND  (FIT  THE  SCENE) 

(FIT  PN  (USER  THE)] 

[SUB  (SCENE  THE)]  [WHAT  (BE  PN 

} [SUB  (6UESS  THE)] 

[OBJ  "Correct"]] 

(SAY  PN  >] 

[SUB  (USER  THE)] 

[WHAT  (BE  PN 

[SUB  (GUESS  THE)] 

[OBJ  "Corrtct")]] 

) 

)] 

the  definition  of  "say"  which  maps  to  #INPUT  requires  that  the  SUB  case  satisfies 
the  check  (ISA  #IO-DEVICE).  This  gives  the  first  parse  a better  measure  than  the 
second,  since  the  SUB  of  the  second  includes  "fit"  as  part  of  Its  compound  SUB,  and 


"fit"  cannot  be  viewed  as  #IO-DEVICE. 


The  noungroup  "each  relation  in  the  concept  which  is  In  the  scene."  provides  an 
example  of  the  "understandability"  criteria.  There  Is  no  a priori  reason  for  It  to 


mean 

[NOUN  (RELATION  EACH  (IN  (CONCEPT  THE)) 

{ 1#BE  NN 

[ARG1  Imatch  to  head  noun] 

[ARG2  (SCENE-THE)] 

»] 

rather  than 

[NOUN  (RELATION  EACH  (IN  (CONCEPT  THE  ( 1IBE  NN 

[ARG1  Imatch  to  head  noun] 
[ARG2  (SCENE_THE)] 

>)))] 


The  Interpreter 


146 


But  if  scenes,  concepts  and  relations  had  been  defined  as  shown  in  Figure  5,12,  the 
first  parse  would  obviously  be  correct,  The  first  modification  in  each  Is  perfect. 
The  reason  is  that  "relation"  is  a #DATA  (Figure  5,6),  hence  there  is  a meaning  for  it 
to  be  modified  by  a prepositional  phrase  whose  preposition  is  "In",  The  meaning  of 
the  modification  Is  #MEMBER,  and  "concept"  satisfies  both  #MEMBER  checks;  It  Is  a 
set,  and  its  generic  element  is  a "relation",  The  second  modification  In  the  first 
parse  is  also  perfect,  1#BE  maps  to  ^MEMBER,  and  "scene"  satisfies  both  checks. 
The  second  modification  of  the  second  parse  is  only  acceptable,  however,  since  It 
fails  the  second  #MFMBER  check  since  "concepts"  cannot  be  viewed  as  the 
generic  element  of  the  scene. 


5.6  Future  work 

5.6.1  Tense  evaluation 

The  Interpreter  makes  almost  no  use  of  the  tense  Information  returned  by  the 
parser.  This  does  not  affect  its  performance  greatly,  as  the  dialogues  it  has 
handled  have  all  been  straightforward  (with  no  skipping  about  into  the  future  or 
past)  linear  algorithm  descriptions. 

But  It  is  easy  to  see  how  the  oroper  interpretation  of  tense  information  Is  necessary 
fc:  understanding  even  the  types  of  dialogues  we  have  been  considering, 

in  "Set  X to  the  tall  of  X.  If  the  head  ot  X is/was  5,  then  ..."  the  use  of  "Is"  or 
"was"  determines  whether  the  program  designer  means  the  first  or  second  element 


of  the  original  X. 


The  Interpreter 


147 


Similarly,  In 

"Test  If  the  scene  fit  the  concept  and  print  "fits"  If  It  does.  Then 
modify  the  concept.  If  the  scene  fits/fit  the  concept..." 

the  use  of  "fit"  or  "fits"  determines  whether  the  "fit"  predicate  should  be 
recalculated  for  the  new  modified  concept,  or  whether  the  old  value  should  be 
accessed. 

5.6.2  More  domain  and  general  programming  support 

Programming  and  domain  knowledge  Is  necessary  for  several  reasons.  A system 
well  versed  in  programming  and  domain  knowledge  will  ask  fewer  unnecessary 
questions  of  of  the  user,  thereby  making  for  a more  practical  system.  A well 
Informed  system  will  also  be  able  to  follow  the  program  designer  that  much  more 
easily. 

For  Instance,  If  the  designer  says, 

"Write  me  a program  which  sorts  a list  of  words.  The  comparison 
function  should  be  alphabetical  order.", 

understanding  the  second  sentence  requires  knowing  something  about  sorting 
programs.  Information  like  this  will  be  forthcoming  from  the  two  PSI  modules 
concerned  with  domain  and  general  programing  support.  The  modules  and  the 
Interface  between  them  and  the  Interpreter  are  being  developed. 


The  Interpreter 


148 


5.6.3  Building  up  more  concepts  and  definitions 

Expanding  the  Interpreter’s  collection  of  concepts  and  deflnltons  Is  the  most 
obvious  Improvement  that  can  be  made  to  the  system.  It  Is  Impossible  for  the 
Interpreter  to  understand  a primitive  idea  unless  It  has  a concept  to  represent  that 
thought.  Thus  a simple  sentence  like  "Print  the  greatest  number  In  the  list"  cannot 
be  understood  unless  the  system  has  the  concepts  #GREAT  and  #GUPERLATIVE. 
And  If  It  can  understand  that  sentence,  the  Interpreter  still  won't  be  able  to 
understand,  "Print  the  number  in  the  list  which  Is  larger  than  any  other  number  In  the 
list"  unless  It  has  defiuitons  which  map  "larger"  into  #GREAT  and  "any  other 
number"  Into  a ^SUPERLATIVE. 

However,  with  the  proper  concepts  and  definitions,  which  are  easy  to  write,  the 
Interpreter  can  understand  these  sentences  and  many  more.  By  having  people 
exercise  the  system,  and  then  teaching  the  system  any  unknown  concepts  and 
definitions  which  have  been  used,  we  hope  to  build  up  a collection  of  concepts  and 
definitions  which  will  be  comprehensive  -ough  to  support  most  reasonable 
dialogues.  Appendix  A contains  dialogues  Illustrative  of  the  system’s  current 


capabilities. 


149 


6.  References 


[Balzer  75] 

Baizer,  R.,  Imprecise  Program  Specification,  Technical  Report  RR-75-36, 
USC/Informatlon  Science  Institute,  Marina  Del  Rey,  Callforla,  1975. 

[Barstow  77] 

Barstow,  D.,  4 Knowledge-based  System  for  Automatic  Program  Construction, 
Proceedings  of  the  Fifth  International  Joint  Conference  on  Artificial 
Intelligence,  1977. 

[Bobrow  76] 

Bobrow,  D.  and  Winograd,  T.,  An  Overview  ol  KRL,  a Knowledge  Representation 
Language,  Memo  293,  Stanford  A.  i.  Project,  Stanford  University,  1976. 

[Brooks  74] 

Brooks,  M.,  Another  Approach  to  English,  Working  Paper  73,  MIT  Artificial 
Intelligence  Labratory,  1974. 

[Bruce  72] 

Bruce,  B.,  A Model  for  Temporal  References  and  Its  Application  in  a Question 
Answering  Program,  Artificial  Intelligence,  Volume  3:  Number  1,  1972. 

[Bruce  75] 

Bruce,  B.,  Case  Systems  for  Natural  Language,  Artificial  intelligence,  Volume  6: 
Number  4,  1 975. 

[Fillmore  6b] 

Fillmore,  C.,  The  Case  for  Case,  in  Unlversals  in  Linguistic  Theory,  Eds.  .Bach,  E. 
and  Hnrms,  R.,  Holt,  Rlneheart  and  Winston,  New  York,  1968. 

[Gardner  76] 

Gardner,  M.,  Scientific  American,  Jt,<n&  1976,  pages  120-125. 

[Green  76] 

Green,  C.,  The  Design  of  the  PSI  Program  Synthesis  System,  In  Second 
International  Conference  on  Software  Engineering,  San  Francisco,  CA., 
October,  1 976. 

[Green  77] 

Green,  C.,  The  Design  ol  the  PSI  Program  Synthesis  System,  Proceedings  of  the 
Fifth  Internationa!  Joint  Conference  on  Artificial  Intelligence,  1977. 

[Grlshman  76] 

Grlshman,  R.,  A Survey  ol  Syntactic  Analysis  Procedures,  American  Journal  of 
Linguistics,  Microfiche  47,  1976. 


References 


150 


[Heidorn  74] 

Heidorn,  G.,  English  as  a l /ery  High  Level  Language  for  Simulation  Programming, 
Proceedings  of  a Symposium  on  Very  High  Level  Languages,  Slgplan  Notices, 
Vol.  9,  No.  4,  1974. 

[Heidorn  76] 

Heidorn,  G.,  Automatic  Programming  Through  Natural  Language:  A Survey,  IBM 
Journal  of  Research  and  Development,  Vol.  20,  No.  4,  1976. 

[Hobbs  77] 

Hobbs,  J.,  From  "Well-written"  Algorithm  Descriptions  Into  Code,  Research 
Report  #77-1,  Department  of  Computer  Sciences,  City  University  of  New  York, 
July,  1977. 

[Kant  77] 

Kant,  E.,  The  Selection  of  Efficient  Implementations  for  a High  Level  Language, 
Proceedings  of  Symposium  on  Artificial  Intelligence  and  Programming 
Languages,  SIGPLAN  Notices,  Volume  12,  Number  8,  SIGART  Newsletter,  Number 
64,  August  1 977. 

[Kuno  63] 

Kuno,  S.,  and  Oettinger,  A.,  Multiple  Path  Syntactic  Analyzer,  in  information 
Processing,  North-Holland  Publishing  Co.,  Amsterdam,  1963. 

[McCune  77] 

McCuiie,  B.,  The  PSI  Program  Model  Builder:  Synthesis  of  I /ery  High-level 
Programs,  Proceedings  of  Symposium  on  Artificial  Intelligence  and 
Programming  Languages,  SIGPLAN  Notices,  Volume  12,  Number  8,  SIGART 
Newsletter,  Number  64,  August  1977. 

[McCune  78] 

McCune,  B.,  Building  Program  Models  Incrementally  from  Informal  Descrl ptlons, 
Ph.D.  thesis,  Al  Memo,  CS  Report,  Artificial  Intelligence  Laboratory,  Computer 
Science  Department,  Stanford  University,  Stanford,  California,  to  appear. 

[Malhotra  75] 

Malhotra,  A.,  Design  Criteria  for  a Knowledge- Based  English  Language  System 
for  Management:  An  Experimental  Analysis,  Technical  Report  TR-146,  Project 
MAC,  MIT,  Cambridge  Massachusetts,  1975. 

[Manna  77] 

Manna,  Z,  and  Waidinger,  R.,  Synthesis:  Dreams  =>  Programs  Memo  302, 
Stanford  A.  I.  Project,  Stanford  University,  1977. 

[Marcus  75] 

Marcus,  M.,  Diagnosis  as  a Notion  of  Grammar,  in  Proceedings  of  a Workshop 
on  Theoretical  Issues  in  Natural  Language  Processing,  Eds.  Schank,  R.  and 
Nash-Weber,  B.,  Cambridge,  Mass.,  June,  1975. 


References 


151 


[Phillips  78] 

Phillips,  J.,  The  use  of  inference  in  automatic  programming  systems,  Al  Memo,  CS 
Report,  Artificial  Intelligence  Laboratory,  Computer  Science  Department,  Stanford 
University,  Stanford,  California,  to  appear. 

[Rlesbeck  74] 

Riesbeck,  C.,  Computer  Analysis  of  Natural  Language  in  Context,  Memo  238, 
Stanford  A.  I.  Project,  Stanford  University,  1974. 

[Rieger  74] 

Rieger,  C.,  Conceptual  Understanding:  A Theory  and  Computer  Program  for 
Processing  the  Meaning  Content  of  Natural  Language  Utterances,  Memo  233, 
Stanford  A.  I.  Project,  Stanford  University,  1974. 

[Robinson  75] 

Robinson,  J.,  A Tuneable  Performance  Grammar,  SRI  Artificial  Intelligence  Center 
Technical  Note  1 12.  1975. 

[Sager  73] 

Sager,  N.,  The  String  Parser  for  Scientific  Literature,  In  Rustln,  R.,  Ed.,  Natural 
Language  Processing,  Algorithmics  Press,  1973. 

[Steinberg  78] 

Steinberg  L.,  A Dialogue  Moderator  for  Program  Specification  Dialogues  in  the 
PSI  System,  Ph  D.  thesis,  Al  Memo,  CS  Report,  Artificial  Intelligence  Laboratory, 
Computer  Science  Department,  Stanford  University,  Stanford,  California,  to 
appear. 

[Stockwell  73] 

Stockwell,  R.,  Schachter,  P.  and  Partee,  B.,  The  Major  Syntactic  Structures  of 
English,  Holt,  Rinehart  and  Winston,  INC.,  1973. 

[Wilks  73] 

Wilks,  Y.,  Preference  Semantics,  Memo  206,  Stanford  A.  I.  Project,  Stanford 
University,  1973. 

[Wlnograd  72] 

Winograd,  T.,  Understanding  Natural  Language,  Academic  Press,  1972. 

[Winston  75] 

Winston,  P.,  Learning  Structural  Descriptions  from  Examples,  In  Winston,  P.,  Ed. 
The  Psychology  of  Computer  Vision,  McGraw-Hill  Book  Company,  Inc.,  1975. 

[Woods  70] 

Woods,  W.,  Network  Grammars  for  Language  Analysis,  Communications  of  the 
ACM,  Voulume  13,  Number  10,  October  1970. 

[Woods  72] 

Woods,  W.,  Kaplan  R.  and  Nash-Weber  B.,  The  Lunar  Sciences  Natural  Language 
Information  System,  BBN  Report  No.  2378,  1972. 


References 


152 


[Woods  73] 

Woods,  W.  An  Experimental  Parsing  System  for  Transition  Network  Grammars,  In 
Rustin,  R„  Ed.  Natural  Language  Processing,  Algorlthmlcs  Press,  1973. 


163 


Appendix  A.  Example  Dialogues 

Here  we  present  a number  of  dialogues  handled  by  the  system.  Some  are  target 
dialogues  for  PSI  system  and  others  were  written  to  Illustrate  a point  in  natural 
language  processing.  Each  dialogue  Is  followed  by  the  program  specification  the 
Interpreter  produced  from  It. 

The  first  dialogue  is  one  of  the  original  PSI  target  dialogues,  in  it,  the  user  wishes 
to  write  a program  similar  to  one  described  in  [Winston  76],  The  program,  known  as 
a concept  formation  program,  Is  "...a  type  of  simple  learning  program  that  takes  as 
Input  several  objects  that  are  Instances  of  some  cuncept  and  also  several  objects 
that  are  not  Instances  of  the  concept.  As  output,  it  produces  an  abstracted 
description  of  the  concept  so  that  given  a new  instance,  It  can  classify  that 
Instance  as  representing  the  concept  or  not"  [GREEN  76],  The  second  dialogue  Is 
another  version  of  the  same  program.  The  next  three  dialogues  are  PSI  target 
dialogues  which  are  simpler  versions  (from  a coding  standpoint)  of  concept 
formation  programs. 

Dialogues  six  and  seven  are  two  different  ways  of  specifying  a graph  traversal 
algorithm.  The  first  Is  a description  of  what  the  algorithm  is  to  accomplish.  The 
second  describes  how  the  algorithm  should  be  implemented. 

The  last  four  dialogues  are  ail  variants  of  Information  retrieval  programs. 


Appendix  A 


Example  Dialogues 


164 


-DIALOGUE  1- 

WHAT  IS  TMr  name  CF  THE  PROGRAM  YOU  WISH  TO  WRITE?  TF 
DOCS  TF  HAVE  A SPECIAL  PROGRAM  TYPE?  Theory  formation 

DESCRIBE  TF. 

It  Inputs  a scene,  tests  whether  It  fits  the  concept,  verifies  the  result  of  this  test 
with  the  user,  and  updates  the  concept.  Then  It  repeats  the  process. 

WHAT  IS  THE  STRUCTURE  OF  THE  SCENE? 

An  Input  scene  will  Just  be  a collection  of  relations,  each  of  which  Is  a property  name 
followed  by  the  names  of  the  objects  over  which  the  property  holds. 

WHAT  IS  THE  STRUCTURE  OF  THE  CONCEPT? 

It  should  be  a set  of  relations,  each  marked  either  “possible"  or  "necessary". 

WHAT  IS  THE  INITIAL  VALUE  OF  THE  CONCEPT? 

Initially  the  concept  has  no  descriptions  at  all. 

PLEASE  DEFINE  WHAT  IT  MEANS  FOR  A SCENE  TO  FIT  A CONCEPT. 

The  scene  fits  the  concept  If  all  of  Ihe  relations  In  the  concept 
that  are  marked  "necessary"  are  part  of  the  scene. 

DESCRIBE  VERIFYING  THE  CONCEPT. 

Output  the  result  of  the  test,  ask  the  user  It  this  Is  correct,  and  Input  the  user's  response. 

WHAT  IS  THE  ARGS  OF  THE  ASk? 

"correct  or  wrong1" 

WHAT  IS  THE  STRUCTURE  OF  THE  RESPONSE? 

The  response  should  be  "correct”  or  “wrong". 

DESCRIBE  UPDATING  THE  CONCEPT. 

There  are  four  cases.  If  the  scene  tit  and  the  user  said  the  guess  was  correct,  then 
every  relation  that  Is  part  of  the  scene  bul  not  part  of  the  concept  should  be  added  to 
the  concept  and  marked  "possible". 

DESCRIBE  THE  NEXT  CASE 

The  nekt  case  Is  If  the  scene  fit  and  the  user  said  "wrong";  then  find  a relation  In  the 

concept  which  Is  marked  "possible"  and  which  Is  not  In  tho  scene.  Change  Its  marking  to  "necessary". 

DESCRIBE  THE  NEXT  CASE 

If  tne  scene  did  not  fit,  and  the  user  said  "correct",  then  do  nothing, 

DESCRIBE  THE  NEXT  CASE. 

If  the  scene  did  not  fit,  and  the  user  said  "wrong",  then  for  each  relation  In  the  concept 
marked  "necessary"  that  Is  not  In  the  scene,  re-mark  It  "possible", 

WHAT  IS  THE  FORMAT  OF  THE  OUTPUT? 

Output  "fit"  If  the  scene  fit,  "didn’t  fit"  If  It  didn't, 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP? 

Stop  when  the  user  types  "gult"  Instead  of  an  Input  scene. 

ARE  WE  FINISHED?  yes 


Appendix  A 


Example  Dialogues 


166 


| 


TF 

fll  - PHI 
L00P1 ! 

PRINT < "Ready  lor  the  SCENE”) 

P?  * REROO  ' 

II  E0UALIR2  “quit") 

Then  CO  EX1T1 
R*  - FIT  (P2  PI) 

Cases:  II  R4 

Then  PRINT ("1 1 1 ") 

• Id  II  NOT  (R4) 

Then  PRINT  ("didn'  I lit”) 

PRINT("corr»cl  or  wrong’") 

PS  ► REROO 

Cases:  II  RNOlfi*  EOUPLIPS  "correct")! 

Th»n  Rll  The  sel  ol  all  P1B  In  R2  such  than 
NOT  (HEIIBERIRIB  Rl)) 

For  all  RIB  In  PI1  do: 

Rl  > INSERT (P10  Rl) 

I1RP(R3  RIB  "possible") 

•Is*  II  RN0IR4  EOURLIRS  "wrong")) 

Then  R7  *■  The  set  ol  any  1 R6  In  Rl  such  that! 

RNOINOT (l1EnBER(R6  R2) > 

EOURL (inflP<R3  R6)  "possible")) 

For  all  R6  In  R7  do: 

HRPIR3  R6  "necessary") 
else  11  RNOINOT (fit) 

EOURLIRS  "correct”)) 

Then  Nil 

•Is*  11  RNOINOT IRS) 

EOURLIRS  "wrong")) 

Then  AS  The  set  oi  all  AS  in  Rl  such  than 
RNO (EOURL  (IftflPIRT  AS)  "necessary") 

NC’lflEIIBERIRS  R2) ) ) 

For  all  A8  in  R9  do: 

HAT  (R3  AS  “possible") 

Goto  L00P1 
EXITl! 

FIT (PI  B2) 

FORRLL  (83)  IflFL  IES  (RNO  (EOURL  1 1 MRP  (R3  Bi)  “necessary") 

HEHBER  <83  B2!) 
nEI1BER<B3  BID 

RS  Is  either  a string  whose  value  Is  "wrong”  or  a string  whose  value  is  "correct". 

R4  Is  either  TRUE  or  FALSE.  B1  Is  a variable  bound  to  R2, 

R3  Is  a mapping  Iron  the  elements  ol  fll  to  either  a string  whose  value 

is  "necessary"  or  a string  whose  value  I*  "possible". 

B3  is  a variable  bound  to  R12.  B is  a variable  bound  to  Rl. 

R2  is  either  a set  whoso  generic  element  is  a record  whose  lields  are 

PROPERTY:  a primitive  name  and  NRHE-SETi  a set  whose  generic  element  Is 
a primitive  name  or  a string  whose  value  Is  “quit". 

Rl  Is  a set  whose  generic  element  Is  a record  whose  lields  are  PROPERTYi 
a primitive  name  end  NAtlE-SET:  a set  whose  generic  element  Is  a 
primitive  name  with  assertions'. 

EOURL  I II1RP  (R3  R12)  R13) 

fl  1 3 is  either  a string  whose  value  Is  "necessary"  or  a string  whose 
value  Is  “possible" 

R12  is  the  generic  element  ol  Rl. 


Appendix  A 


Example  Dialogues 


166 


-DIALOGUE  2- 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  VOU  WISH  TO  WRITE? 

Arch. 

DOES  ARCH  HAVE  A SPECIAL  PROGRAM  TYPE’ 

Theory  formation. 

DESCRIBE  ARCH. 

The  program  Is  a loop.  On  each  Iteration,  It  reads  In  a 
trial-item,  matches  the  input  to  the  Internal  concept  model,  and 
prints  the  result  of  the  match  The  user  must  then  inform  the 
program  whether  or  not  the  guess  Is  conec!,  and  the  program 
modifies  Its  model  of  the  concept  accordingly. 

WHA,  IS  THE  STRUCTURE  OF  THE  TRIAL-ITEM’ 

A trial -Item  Is  a set  of  relations. 

WHAT  IS  THE  STRUCTURE  OF  THE  MODEL’ 

The  model  Is  a set  of  relations,  each  marked  "necessary*  dr 
"possible". 

WHAT  IS  THE  STRUCTURE  OF  THE  RESPONSE’ 

''correct"  or  "wrong". 

WHAT  IS  THE  STRUCTURE  OF  T 'E  RELATIONS  IN  THE  TRIAL-ITEM’ 
a relation  Is  a property  name  and  a list  of  atoms. 

WHAT  IS  THE  INITIAL  VALUE  OF  THE  MODEL’ 

Nil. 

PLEASE  DEFINE  WHAT  IT  MEANS  FOR  A SCENE  TO  MATCH  A CONCEPT. 
A trial-item  matches  the  model  If  all  the  re.atlons  In  the  model 
lhat  are  marked  “necessary"  are  elements  of  the  trlat-ttem, 

WHAT  IS  THE  FORMAT  OF  THE  PRINT’ 

Type  "the  trial-item  mitches"  If  the  trial-item  matched  and 
"the  trial-item  does  not  match"  If  It  didn't. 

DESCRIBE  MODIFYING  THE  MOOEL 

It  the  trial-item  matched  and  the  user  said  the  guess  was  correct, 
then  every  relation  In  the  trial-item  which  Is  nol  In  the  model 
should  be  put  in  tr,e  model  and  marked  "possible".  If  the 
trial-item  matched  and  the  user  said  "wrong”  , then  change  the 
marking  of  a relation  marked  "possible"  which  Is  In  the  model  and 
not  In  the  trla’-ltem  to  "necessary".  II  the  trial-item  didn't 
match,  and  the  user  said  "correct"  , then  do  nothing.  It  the 
trial-item  did  not  match,  and  the  user  said  "wrong"  , then  re-mark 
each  relation  In  the  model  mark'd  "necessary"  that  Is  not  In  the 
trial-item  "possible". 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP’ 

stop  when  the  user  types  "quit"  Instead  of  a trial  - Hem. 

ARE  WE  FINISHED’  yes. 


Appendix  A 


Example  Dialogues 


167 


ARCH 

Al  * PHI 
L00P1: 

PRINT("Rtady  for  Ih.  TRIRL-1  TEH“> 

A 2 - REAOO 
II  EQUAL (R2  "quit") 

Thtn  CO  EXIT1 
A4  * flRTCH (R2  All 
Casts:  II  04 

Then  PRlNT("tht  trial-ittm  matches") 

• Isa  II  NOT <R* ) 

Than  PRINT  <“S  ha  1 r i a 1 - 1 1 am  dots  not  match") 

05  ► REAOO 

Casts:  If  AN0IA4  EQUAL  IAS  "eorrtet")) 

Thtn  All  » Tha  stt  ol  all  A18  In  02  such  that: 

NOT (HEADER (RIB  Al)) 

For  all  S 1 8 In  All  do: 

Al  ► INSERT (R18  Al) 

IIRP t A3  A18  "posslblt") 

• 1st  If  AN0IA4  EOUALIAS  "wrong")) 

Thtn  A?  » Tha  sat  ol  any  1 A6  In  Al  such  that: 

HNOINOT (MEMBER (A6  02)) 

EOUAL  I IltAP  (R3  06)  "posslblt")) 

For  a I I 06  in  A7  do : 

HAP (A3  06  "ntcttsary") 
a 1st  II  RNO (NOT (R4 ) 

EQUAL  (AS  "correct'1)) 

Than  NIL 

•1st  II  ANOINOT (04) 

EOUALIAS  "wrong")) 

Thtn  A9  - Tha  sat  of  all  A8  In  Al  such  that: 

ANOIEQUAL ( IhflP (A3  OS'  "ntcttsary") 

NOT  IflEflflER  (A8  02)1) 

For  all  A6  in  A9  do: 

HAP(03  A8  "posslblt") 

Coto  LOOP  1 
EXIT1: 

hATCMIBl  B2) 

FORALL  (B3)  IIIPL  IES  (AND  (EQUAL  (IHAP  (A3  B3)  "ntcttsary") 

MEMBER (B3  62)) 

ItEIIBER (B3  BD) 

OS  is  tlthar  a string  whota  valut  is  "wrong"  or  a string  whost  valut  It  "corrtct". 

04  is  either  TRUE  or  FOLSE.  B3  Is  a variable  bound  to  A12. 

A3  is  a napping  Iron  tha  fitments  ol  Al  to  tlthtr  a string  whose  value 
Is  "posslblt"  or  a string  whost  valut  Is  "ntctssary". 

t>2  Is  a variable  bound  to  Al.  B1  Is  a variable  bound  to  A2. 

A2  is  tlthar  a stt  whose  generic  element  is  a record  whost  1 1 • Ids  art 
PROPERTY:  a primitive  name  and  ATOft-LIST:  a list  whost  gtntrlc  titmtnt 
is  a primitive  or  a string  whose  value  Is  "quit". 

Al  Is  a set  whose  generic  element  Is  a record  whose  lltlds  ars  PROPERTY: 
a primitive  name  and  ATOM-LIST:  a list  whost  gtntrlc  titmtnt  is  a 
primitive  with  assertions: 

EXISTS  (B4 ) EQUAL  (IPIAP  (A3  B4)  A13) 

013  Is  either  a string  whose  value  Is  "posslblt"  or  a string  whost  valut  it  "ntctttary". 
B4  Is  a .ar lab  It  bound  to  012.  A12  Is  tht  gtntrlc  titmtnt  ol  Al. 


Appendix  A 


Example  Dialogues 


-DIALOGUE  3- 

WHAT  tS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE’  CLASS 
DOES  CLASS  HAVE  A SPECIAL  PROGRAM  TYPE’  No 

DESCRIBE  CLASS. 

CLASS  first  Inputs  a concept.  Then  It  repeatedly  accepts  on  Input  scene  from  ■■  jser,  classifies 
It  based  on  whether  or  not  II  fils  the  concept,  and  outputs  this  classifies,  ion  to  the  user. 

WHAT  IS  THE  STRUCTURE  OF  THE  CONCEPT’ 

The  concept  wlil  just  be  a collection  of  relations,  each  of  which  Is  a propetiy  name 
followed  by  the  names  of  the  objects  over  which  the  property  holds. 

WHAT  IS  THE  STRUCTURE  OF  THE  SCENE’ 

The  scene  has  the  same  structure  as  the  concept. 

PLEASE  DESCRIBE  WHAT  IT  MEANS  FOR  A SCENE  TO  FIT  A CONCEPT. 

The  scene  fils  the  concept  If  all  of  the  relations  In  the  concept  are  part  of  the  scene. 

WHAT  r THE  FORMAT  OF  THE  OUTPUT’ 

Output  'it”  If  the  sceoe  fit,  "Didn't  fit"  If  It  didn't. 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP’ 

Stop  when  the  user  types  "Quit"  Instead  of  an  It  out  scene. 

ARE  WE  FINISHED’  yes 

CLASS 

PRINT  ("Peady  for  the  CUNCEPT") 

A I * RENO ( ) 

LOOP 1 : 

PRINT ( "Ready  for  the  SCENE") 

A2  - REAOd 
If  EQUAL (A2  "Qu it") 

Then  CO  EXITI 
A3  - FIT(A2  Al) 

Case's  'f  P 

Then  PRINT ("F it") 
else  If  NOT  (A3) 

Then  PR  INI ( "0 i dn’ 1 fit") 

Goto  L00P1 
EXITI: 

FITtBl  B2> 

FORALL (B3>  IHP|  'ES  (IIEhBER (B3  B2) 

MEMBER <B3  BIO 

A3  is  either  TRUE  or  FALSE.  PI  Is  a variable  bound  to  A2. 

B3  is  a variable  bound  to  AS.  B2  ,s  a variable  bound  to  Al. 

A2  is  either  a set  whose  generic  element  Is  a record  whose  fields  are 
PROPERTY:  a primitive  name  and  NAME— 5E T : a set  whose  generic  element  is 
a primitive  name  or  a string  whose  value  is  "Quit". 

Al  is  a set  whose  generic  element  Is  a record  whose  fields  are  PROPERTY! 
a primitive  name  and  NAflt-SE1:  a set  whose  generic  element  ts  a primitive  name. 


A4  Is  the  generic  element  of  Al. 


Appendix  A 


Example  Dialogues 


159 


-DIALOGUE  4- 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  VOU  WISH  TO  WRITE? 
Classl 

DOES  CLASS1  HAVE  A SPECIAL  PROGRAM  TYPE? 
no 

DESCRIBE  CLASS  1 . 

II  reads  a concept.  Then  II  repeatedly  reads  a siring  and  tests  It  It 
Is  In  the  concept.  If  It  Is,  It  types  "Its  In!".  Otherwise,  It  types 
"no'  In".  The  concept  Is  a collection  ot  strings. 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP? 

Stop  when  the  user  types  "quit" 

ARE  WE  FINISHEO? 

Yes. 


CLflSSl 

PRINT ("Ready  lor  tha  CONCEPT") 

HI  ► REROO 
LOOPli 

PRINT ("Rtady  (or  tha  STRING") 

02  - REPOO 

It  EQUAL (02  "quit") 

Than  GO  EX1T1 

03  - MEftBER<02  01) 

It  03 

Than  PR INT ( "1 ti  ‘ n ! “ ) 
alia  PRIN1 ("not  In") 

Goto  L00P1 
EXlTli 


03  I a a I than  TRUE  or  FOLSE. 

02  It  althar  * itrlng  or  a string  uhota  velua  li  "quit", 
01  It  a tat  uhota  ganarlc  alainant  It  a ttrlng. 


Appendix  A 


Example  Dialogues 


160 


-DIALOGUE  5- 

WHAT  13  THE  NAME  OF  THE  PROGRAM  VOU  WISH  TO  WRITE’ 
ClassO. 

DOES  CLASSO  HAVE  A SPECIAL  PROGRAM  TYPE? 


no. 


DESCRIBE  CLASSO. 

II  reads  a concept, 

WHAT  IS  THE  STRUCTURE  OF  THE  CONCEPT? 

a collection  of  strings, 

ARE  WE  FlNISHEO’ 

Then  It  repeatedly  reads  a set  of  strings  and  prints  "Ifs  a s‘*>set"  If  all  the 
elements  in  the  set  are  In  the  concept.  Otherwise  It  prints  "not  a subset". 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP? 

Stop  when  the  user  types  "quit". 

ARE  WE  FIN  HEO’ 

Yes. 


CL0SS8 

PRINT ("Ready  lor  the  CONCEPT") 
fll  - REOOO 
L0PP1 : 

PRINT ("Ready  lor  (he  ELEMENT-SET") 
02  » REOOO 
If  EQUAL (02  "quit") 

Then  CO  EXIT1 

If  FOROLL(Bl)  IMPL IES (MEMBER (B1  02) 
MEMBER (8 ’ 01)) 
Then  PRINT("!ts  a subset") 
else  PRINTCnot  a subset") 

Goto  L00P1 
EXIT1 : 


B1  is  a variable  bound  to  R3. 

02  is  either  a set  whose  generic  element  is  a string  or  a string  whose 
value  Is  "quit". 

fll  Is  a set  whose  generic  element  Is  a string. 

03  Is  the  generic  element  of  02. 


Appendix  A 


Example  Dialogues 


161 


-DIALOGUE  6- 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE?  Flndnodes 
DOES  FINDNODES  HAVE  A SPECIAL  PROGRAM  TYPE?  no 

DESCRIBE  FINDNODES. 

The  program  should  Input  a graph  and  a node.  A graph  Is  a set  of  pairs. 

Each  pair  consists  of  two  nodes,  which  are  primitives.  The  program 
outputs  a list  of  all  the  nodes  which  can  be  reached  from  the  Input  node. 

PLEASE  DESCRIBE  WHAT  IT  MEANS  FOR  A NODE  TO  BE  REACHED  FROM  ANOTHER  NODE. 

A node  x Is  connected  to  a node  Y If  there  exists  a pair  in  the  graph 
such  that  X and  Y are  In  the  pair.  X can  be  reached  from  Y If  X Is 
connected  to  Y or  If  X can  be  reached  from  a node  which  Is  connected  to  Y. 

ARE  WE  FINISHED’  Yes. 


FlrtONOOES 

PRlNTPReady  for  the  GRAPH  and  the  NODE") 

A 1 <-  RERDO 
A2  r RERDO 

REflCH*A3  02) * ' ' B3  ^ *"  R4  ^ 31  *UCh  ,ha,! 
PRINT  < OS > 


REACH <B1  B2) 

OR (CONNECT  < B 1 B2) 

EXISTS  CBS)  ANO  (CONNECT (B6  B2) 
REACH <B1  B6))> 


CONNECT (B3  B«> 

EXISTS (B5)  OND (MEMBER <BS  All 
HEflBER  (B3  BS) 
flEflBER  (B*  BSD 


B6  l.  a variable  bound  to  A6.  B5  It  a variable  bound  to  A7. 

84  '*  a v*rlab'e  b°u"«  'o  Y.  B3  It  a variable  bound  to  X . 

B2  Is  a variable  bound  to  Y.  B1  it  a variable  bound  to  X 

A4  It  the  generic  element  0I  Al.  A2  It  , prleltlve. 

A1  Is  a set  uhose  generic  eleaent  it  a let  of  alze  2 uhote  oenerlc 

eleaent  it  a primitive.  ganeric 


X it  a primitive.  Y It  a primitive. 

A?  It  a let  ot  tlze  2 uhote  generic  element  It  a primitive. 
A6  It  a primitive. 


Appendix  A 


Example  Dialogues 


162 


-DIALOGUE  7- 

WHAT  IS  THE  NAME  OP  THE  PROGRAM  YOU  WISH  TO  WRITE?  Reach. 

DOES  REACH  HAVE  A SPECIAL  PROGRAM  TYPE?  No. 

DESCRIBE  REACH, 

II  rends  a graph  and  a node.  The  graph  Is  a set  ot  nodes  and  a mapping  which  maps  a pair  of 
nodes  Into  an  edge.  Nodes  and  edges  are  prln  dives.  The  program  puls  the  Input  node  Into 
a set  Nodes.  Then  It  repeatedly  finds  a nude  In  Nodes  which  Is  not  marked  "done"  , marks  It 
"done"  , and  places  all  the  graph  nodes  connected  to  It  which  are  not  In  Nodes  Into  Nodes. 

WHAT  IS  THE  INITIAL  VALUE  OF  NODES?  Nil. 

PLEASE  DEFINE  WHAT  IT  MEANS  FOR  A NODl  TO  BE  CONNECTED  TO  ANOTHER  NODE. 

Two  nodes  are  connected  If  they  are  mapped  into  an  edge. 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP? 

Stop  when  there  are  no  nodes  In  Nooas  which  are  not  marked  "done". 

ARE  WE  FINISHED?  Output  Nodes 

WHICH  ONE  OF  EITHFR  REACH  OR  THE  LOOP  DOES  THE  OUTPUT  BELONG  IN?  Reach. 

REflFH 

NODES  » PHI 

PR  IN  1 1 "Ready  tor  th«  GRAPH  and  the  NODE") 

01  - READ!) 

A?  - REAOO 

NODES  » INSERT (A?  NOOES) 

LOOP  1 : 

04  *■  The  set  of  any  I A3  in  NOOES  such  thati 
NOT (EQUAL (IhAP (AS  A3!  "done")! 

If  EQUAL (04  PHI) 

Then  GO  EXITI 
For  a I I A3  in  04  do: 

HOP (OS  A3  "done"! 

08  The  set  of  all  06  in  07  such  that: 

ANO  (FORALL  (PI!  IflPL  IES  (HFHRER  (BI  A4  ) 

CONNECT  (06  BID 
NOT (ME HPER (06  NOOES!!) 

For  all  06  in  08  do: 

NOOES  - INSERT (A6  NOOES! 

Goto  L00PI 
EXITI: 

PRINT (NOOES) 

CONNECT (R2  B3)  EXISTS (R4 ) EQUAL ( IhAP (09  IB2  B3I  ) B4) 

A9  Is  the  0ATA  ot  the  HOPPING  Held  of  01.  B4  Is  a variable  bound  to  A18.  02  Is  a primitive. 

B2  is  a ariable  bound  to  012.  BI  Is  a variable  bound  to  A3.  B3  Is  a variable  bound  to  All. 

A7  is  the  0ATA  ot  the  N00E-SET  field  of  AI.  NUDES  Is  a set  (those  generic  element  Is  a primitive. 

AS  is  a mapping  from  the  elements  ot  04  to  a string  whose  value  Is  "done” 

01  is  a record  whose  fields  are  N00E-SET:  a set  whose  generic  element  is  a primitive  and 
HAPPING:  a mapping  trom  a set  ot  size  2 whose  generic  element  Is  a primitive  to  a primitive. 

AI2  is  a primitive.  All  Is  a primitive.  AI0  Is  a primitive. 


Appendix  A 


Example  Dialogues 


103 


-DIALOGUE  8- 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE? 
COOKBOOK 

DOES  COOKBOOK  HAVE  A SPECIAL  PROGRAM  TYPE? 


DESCRIBE  COOKBOOK. 

Cookbook  inputs  a recipe  list,  and  then  repeatedly  Inputs  a name  and 
prints  the  recipe  with  that  name. 

WHAT  IS  THE  STRUCTURE  OF  THE  RECIPES  IN  THE  LIST’ 

A recipe  has  a name,  an  In^redlent-llst,  and  directions.  The  name  and  the  directions  are 

strings  of  characters.  An  Ingredient -list  Is  a list  whose  elements  have  an  Ingredient  and  an  amount. 

WHAT  IS  THE  STRUCTURE  OF  THE  INGREDIENT? 

An  Ingredient  and  an  amount  are  both  strings  of  characters. 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP’ 

Stop  when  the  user  types  "quit"  as  a recipe  name. 

ARE  WE  FINISHED’  Yes. 


(COOkBOOk  NIL 

PRINT ("Ready  lor  the  RECIPE-LIST") 
fll  - REROO 
L00P1 i 

PRINT ("Ready  lor  the  NRhE") 

R2  - REROO 
It  EOURL (R2  "quit") 

Than  CO  EXITI 

Rk  ► The  let  ol  all  R3  in  R1  such  thati 
EOURL (R?  FETCH (03  NRhE)) 

For  all  R3  In  flk  dot 
PRINT  (03  > 

Goto  L00P1 
EXITI: 

) 

R2  Is  either  a primitive  name  or  a string  whose  value  Is  "quit" 

R1  Is  a list  whose  generic  element  Is  a record  whose  Helds  are  NRftEi  a 
string  , INGREDIENT-LIST:  a list  whose  generic  element  Is  a record  whose 
fields  are  INGREDIENTi  a string  and  RftOUNTi  a string  , and 
OIRECTION-SETi  a set  whose  generic  element  Is  a string. 


Appendix  A 


Example  Dialogues 


164 


-DIALOGUE  9- 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE? 

Recipe. 

DO?  S RECIPE  HAVE  A SPECIAL  PROGRAM  TYPE ? 

no, 

DESCRIBE  RECIPE. 

It  reads  In  a recipe  list.  A recipe  consists  of  a name  and  a string. 

Then  it  enters  a loop  which  reads  a recipe  name  and  prints  the  recipe 
with  that  name.  If  there  no  recipe  with  that  name,  print  'no  such  recipe". 

WHAT  IS  THE  EXIT  TEST  OF  THE  LOOP? 

Stop  when  the  user  types  "quit* 

are  we  finished? 

Yes. 


RECIPE 

PRINT ("Ready  tor  the  RECIPE -LIST") 

fll  ► REROO 

LCOPlt 

PRINT ("Readq  lor  tha  NflflE") 

B2  - REPDO 
It  EQUAL (A2  "quit") 

Then  CO  EXIT! 

A4  » The  set  ol  all  A3  In  A1  such  thati 
EQUAL  (FETCH  IA3  NAftE ) A2> 

II  EQUAL <A4  PHI) 

Then  PRINTCno  such  recipe") 
e 1st  For  a I I A3  in  At  dor 
PR  INT (A3) 

Colo  L00P1 
EXIT1: 


A?  Is  either  a primitive  name  or  a string  whose  value  le  "quit". 


At  Is  a list  whose  generic  element  Is  a record  whose  fields  are  NRftE i a 
primitive  name  and  STRINGt  a string. 


Appendix  A 


Example  Oiaiogues 


165 


-DIALOGUE  10- 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE?  Newj. 

DOES  NEWS  HAVE  A SPECIAL  PROGRAM  TYPE?  News  retrieval. 

DESCRIBE  NEWS. 

II  answers  questions  about  a data  base. 

WHAT  IS  THE  STRUCTURE  OF  THE  GENERIC  ElEMENTOF  THE  DATA-BASE? 

The  data  base  Is  a list  of  descriptors,  each  of  which  describes  one  newspaper  story. 

WHAT  IS  THE  STRUCTURE  OF  THE  DESCRIPTOR? 

A descriptor  consists  of  a set  of  features. 

WHAT  IS  THE  STRUCTURE  OF  THE  QUESTION? 

A feature  and  a flag. 

WHAT  IS  THE  STRUCTURE  OF  THE  FEATURES  IN  THE  DESCRIPTOR’ 

Each  feature  consists  of  a property  and  a value.  A property  Is  a name  which 
Is  a character  string.  A value  Is  either  a number  or  a string  of  characters. 

WHAT  IS  THE  STRUCTURE  OF  THE  FLAG’ 

The  flag  will  be  either  "descriptors"  or  "number". 

WHAT  ts  THE  INITIAL  VALUE  OF  THE  DATA-BASE?  Unnecessary. 

DESCRIBE  THE  ANSWER. 

If  the  flag  Is  "descriptors"  , then  the  program  should  print  each 
descriptor  that  contains  the  feature. 

ARE  WE  FINISHED? 

If  the  flag  Is  "number"  , then  It  should  Just  print  the  total 
number  ot  such  descriptors. 

ARE  WE  FINISHED?  Yes. 

NEUS 

PR t NT t "Ready  tor  :he  QUESTION") 

Pi  * REROO 

A4  * The  set  ot  all  A2  In  P3  such  that: 
flEtIRER  (PS  P2! 

If  EQUAL (P6  "descriptors") 

Then  For  all  A2  In  P4  do: 

PRINT  <R2) 

! f EQUAL (PS  "number") 

Then  PR  t NT ( P 7 ) 

P6  is  the  OPTP  of  the  FLAG  field  of  PI.  RS  Is  the  DATA  ot  the  FEATURE  lleld  ol  Al. 
P7  is  a number  which  is  the  size  ot  PA . 

P3  Is  a list  whose  generic  element  Is  a set  whose  generic  element  It  a record 
whose  tlelds  are  PROPERTY:  a string  and  VALUEi  either  a ttrlng  or  a number. 

Pi  it  a record  whose  fields  are  FEATURE:  a record  whose  fields  are  PROPERTY: 
a ttrlng  and  VALUE:  either  a string  or  a number  and  FLAG:  either  a ttrlng 
whose  value  It  "number"  or  a ttrlng  whole  value  It  "deter Iptori". 


Appendix  A 


Example  Dialogues 


166 


-DIALOGUE  11- 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE? 

Writ*  me  a news  retrieval  program 

WHAT  IS  THE  NAME  OF  THE  PROGRAM  YOU  WISH  TO  WRITE?  News 
DESCRIBE  NEWS. 

It  has  a data  base  of  nows  stories.  Each  story  has  a sat  of  key  words  associated  with  It.  I'll 
request  a story  ,.y  giving  a key  word.  The  response  should  be  all  the  stories  with  that  key  word. 
If  the  command  Is  "number"  followed  by  a key  word,  then  the  number  of  stories  having  that 
kuy  word  should  be  printed, 

WHAT  IS  THE  STRUCTURE  OF  THE  STORYS  IN  THE  DATA-BASE? 
they  are  strings. 

WHAT  IS  THE  STRUCTURE  OF  THE  KEY-WORDS  IN  THE  SET? 
primitive. 

WHAT  IS  THE  INITIAL  VALUE  OF  THE  DATA-BASE’ 


unnecessary. 

WHAT  IS  THE  INITIAL  VALUE  OF  THE  KEY  WORD  SET? 


unnecessary. 

ARE  WE  FINISHED’  yes 


NFUS 

PRINT ( "Ready  for  the  )EY-U0R0"> 
ftl  » REPOO 

Casesi  If  EQURLtni  nil) 

Then  fll0  •-  The  set  of  all  (19  In  P3  such  that: 

fiEhBER (nil  innp<n6  ngn 

PRINT  (mm 

els*  It  EQUnuni  R8) 

Then  n*  - The  set  ol  all  P2  In  P3  such  that! 

HFflBER  (PS  lnnp<n6  B21) 

PRINT  CR7 ) 

nil  is  a primitive. 

nB  Is  a record  whose  fields  are  STRING!  a string  whose  value  Is  "number" 
and  kEY-UOROi  a primitive. 

R7  Is  a number  which  Is  the  size  of  He, 

H6  Is  a mapping  from  the  elements  of  P3  to  a set  whose  generic  element  Is  a primitive. 

PS  Is  the  ORTR  of  the  KEY-UORO  field  of  P8. 

R3  Is  a set  whose  generic  element  Is  a string,  ni  Is  either  Rll  or  P8. 


